{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# How to Use the Goodreads Scraper"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By [Melanie Walsh](https://melaniewalsh.org/) and [Maria Antoniak](https://maria-antoniak.github.io/)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This Jupyter notebook will walk you through how to collect Goodreads reviews and Goodreads book metadata (# reviews, # ratings, shelves, etc.) with our [Goodreads Scraper](https://github.com/maria-antoniak/goodreads-scraper) Python scripts.\n",
    "\n",
    "*Note: We recommend running these Python scripts from the command line. Please note that these scripts may not work consistently from a Jupyter notebook environment and that the tutorial is mostly intended for demonstration purposes.*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Download the `goodreads-scraper` GitHub Repository"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, you need to download the `goodreads-scraper` GitHub repository (which includes this Jupyter notebook). You can either clone the repository with git:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Cloning into 'goodreads-scraper'...\n",
      "remote: Enumerating objects: 129, done.\u001b[K\n",
      "remote: Counting objects: 100% (129/129), done.\u001b[K\n",
      "remote: Compressing objects: 100% (105/105), done.\u001b[K\n",
      "remote: Total 129 (delta 64), reused 53 (delta 22), pack-reused 0\u001b[K\n",
      "Receiving objects: 100% (129/129), 31.45 KiB | 7.86 MiB/s, done.\n",
      "Resolving deltas: 100% (64/64), done.\n"
     ]
    }
   ],
   "source": [
    "!git clone https://github.com/maria-antoniak/goodreads-scraper.git"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or you can download the repository as a zip file by clicking the following link:  \n",
    "https://github.com/maria-antoniak/goodreads-scraper/archive/master.zip"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you haven't already, move into the `/goodreads-scraper` directory:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 415,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/Users/melaniewalsh/Goodreads-Project/notebooks/goodreads-scraper\n"
     ]
    }
   ],
   "source": [
    "cd goodreads-scraper/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Install required Python packages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: beautifulsoup4 in /Users/melaniewalsh/opt/anaconda3/lib/python3.7/site-packages (from -r requirements.txt (line 1)) (4.6.0)\n",
      "Requirement already satisfied: selenium in /Users/melaniewalsh/opt/anaconda3/lib/python3.7/site-packages (from -r requirements.txt (line 2)) (3.141.0)\n",
      "Requirement already satisfied: lxml in /Users/melaniewalsh/opt/anaconda3/lib/python3.7/site-packages (from -r requirements.txt (line 3)) (4.5.2)\n",
      "Requirement already satisfied: geckodriver-autoinstaller in /Users/melaniewalsh/opt/anaconda3/lib/python3.7/site-packages (from -r requirements.txt (line 4)) (0.1.0)\n",
      "Requirement already satisfied: chromedriver-py in /Users/melaniewalsh/opt/anaconda3/lib/python3.7/site-packages (from -r requirements.txt (line 5)) (86.0.4240.22)\n",
      "Requirement already satisfied: urllib3 in /Users/melaniewalsh/opt/anaconda3/lib/python3.7/site-packages (from selenium->-r requirements.txt (line 2)) (1.25.9)\n"
     ]
    }
   ],
   "source": [
    "!pip install -r requirements.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Inspect Goodreads Book ID File"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To collect Goodreads data about particular books, you need to create a plain text file with the books' corresponding Goodreads IDs.\n",
    "\n",
    "Goodreads IDs can be found at the end of a book's Goodreads URL. For example, the book ID for *Little Women* — https://www.goodreads.com/book/show/1934.Little_Women — is `1934.Little_Women`.\n",
    "\n",
    "You can inspect the the sample book ID file included in the repository as an example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 419,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1885.Pride_and_Prejudice\n",
      "2657.To_Kill_a_Mockingbird\n",
      "4671.The_Great_Gatsby\n",
      "10210.Jane_Eyre\n",
      "1371.The_Iliad\n",
      "6185.Wuthering_Heights\n",
      "5107.The_Catcher_in_the_Rye\n",
      "11337.The_Bluest_Eye\n",
      "320.One_Hundred_Years_of_Solitude\n",
      "36529.Narrative_of_the_Life_of_Frederick_Douglass\n",
      "1934.Little_Women\n",
      "12296.The_Scarlet_Letter\n",
      "18423.The_Left_Hand_of_Darkness\n",
      "14942.Mrs_Dalloway\n",
      "38447.The_Handmaid_s_Tale"
     ]
    }
   ],
   "source": [
    "!cat goodreads_classics_sample.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Collect Goodreads Book Metadata\n",
    "## # ratings, # reviews, average rating, shelves, lists etc."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Make new directory for book metadata"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Make a new directory to output the book metadata "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 417,
   "metadata": {},
   "outputs": [],
   "source": [
    "!mkdir classic_book_metadata"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Run book metadata collection script"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Below we're running the `get_books.py` script and directing it to output files to `/classic_book_metadata`. We're also setting the file format of the aggregated book metadata as a CSV file (in addition to a JSON file)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 429,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2020-10-15 12:26:16.651934 get_books.py: Scraping 1885.Pride_and_Prejudice...\n",
      "2020-10-15 12:26:16.651985 get_books.py: #1 out of 15 books\n",
      "=============================\n",
      "2020-10-15 12:27:07.449440 get_books.py: Scraping 2657.To_Kill_a_Mockingbird...\n",
      "2020-10-15 12:27:07.449466 get_books.py: #2 out of 15 books\n",
      "=============================\n",
      "2020-10-15 12:27:56.722658 get_books.py: Scraping 4671.The_Great_Gatsby...\n",
      "2020-10-15 12:27:56.722674 get_books.py: #3 out of 15 books\n",
      "=============================\n",
      "2020-10-15 12:28:49.392396 get_books.py: Scraping 10210.Jane_Eyre...\n",
      "2020-10-15 12:28:49.392417 get_books.py: #4 out of 15 books\n",
      "=============================\n",
      "2020-10-15 12:29:38.987517 get_books.py: Scraping 1371.The_Iliad...\n",
      "2020-10-15 12:29:38.987533 get_books.py: #5 out of 15 books\n",
      "=============================\n",
      "2020-10-15 12:30:29.021774 get_books.py: Scraping 6185.Wuthering_Heights...\n",
      "2020-10-15 12:30:29.021790 get_books.py: #6 out of 15 books\n",
      "=============================\n",
      "2020-10-15 12:31:17.578016 get_books.py: Scraping 5107.The_Catcher_in_the_Rye...\n",
      "2020-10-15 12:31:17.578033 get_books.py: #7 out of 15 books\n",
      "=============================\n",
      "2020-10-15 12:32:06.722567 get_books.py: Scraping 11337.The_Bluest_Eye...\n",
      "2020-10-15 12:32:06.722586 get_books.py: #8 out of 15 books\n",
      "=============================\n",
      "2020-10-15 12:32:55.962465 get_books.py: Scraping 320.One_Hundred_Years_of_Solitude...\n",
      "2020-10-15 12:32:55.962481 get_books.py: #9 out of 15 books\n",
      "=============================\n",
      "2020-10-15 12:33:47.293910 get_books.py: Scraping 36529.Narrative_of_the_Life_of_Frederick_Douglass...\n",
      "2020-10-15 12:33:47.293925 get_books.py: #10 out of 15 books\n",
      "=============================\n",
      "2020-10-15 12:34:26.864034 get_books.py: Scraping 1934.Little_Women...\n",
      "2020-10-15 12:34:26.864057 get_books.py: #11 out of 15 books\n",
      "=============================\n",
      "2020-10-15 12:35:17.065387 get_books.py: Scraping 12296.The_Scarlet_Letter...\n",
      "2020-10-15 12:35:17.065412 get_books.py: #12 out of 15 books\n",
      "=============================\n",
      "2020-10-15 12:36:05.814597 get_books.py: Scraping 18423.The_Left_Hand_of_Darkness...\n",
      "2020-10-15 12:36:05.814614 get_books.py: #13 out of 15 books\n",
      "=============================\n",
      "2020-10-15 12:36:57.055810 get_books.py: Scraping 14942.Mrs_Dalloway...\n",
      "2020-10-15 12:36:57.055825 get_books.py: #14 out of 15 books\n",
      "=============================\n",
      "2020-10-15 12:37:47.932972 get_books.py: Scraping 38447.The_Handmaid_s_Tale...\n",
      "2020-10-15 12:37:47.932992 get_books.py: #15 out of 15 books\n",
      "=============================\n",
      "2020-10-15 12:38:37.224221 get_books.py:\n",
      "\n",
      "🎉 Success! All book metadata scraped. 🎉\n",
      "\n",
      "Metadata files have been output to /classic_book_metadata\n",
      "Goodreads scraping run time = ⏰ 0:12:20.573623 ⏰\n"
     ]
    }
   ],
   "source": [
    "!python get_books.py --book_ids_path goodreads_classics_sample.txt --output_directory_path classic_book_metadata --format csv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's import Pandas so we can read in the aggregated CSV and see what the data looks like:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>book_id_title</th>\n",
       "      <th>book_id</th>\n",
       "      <th>book_title</th>\n",
       "      <th>isbn</th>\n",
       "      <th>isbn13</th>\n",
       "      <th>year_first_published</th>\n",
       "      <th>author</th>\n",
       "      <th>num_pages</th>\n",
       "      <th>genres</th>\n",
       "      <th>shelves</th>\n",
       "      <th>lists</th>\n",
       "      <th>num_ratings</th>\n",
       "      <th>num_reviews</th>\n",
       "      <th>average_rating</th>\n",
       "      <th>rating_distribution</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>320.One_Hundred_Years_of_Solitude</td>\n",
       "      <td>320</td>\n",
       "      <td>One Hundred Years of Solitude</td>\n",
       "      <td>8420471836</td>\n",
       "      <td>9788420471839</td>\n",
       "      <td>1967</td>\n",
       "      <td>Gabriel García Márquez</td>\n",
       "      <td>417</td>\n",
       "      <td>['Fiction', 'Classics', 'Magical Realism', 'Li...</td>\n",
       "      <td>{'to-read': 660690, 'currently-reading': 34629...</td>\n",
       "      <td>{'Best': 80020, 'Books': 7, 'Favorite': 184, '...</td>\n",
       "      <td>751336</td>\n",
       "      <td>30615</td>\n",
       "      <td>4.08</td>\n",
       "      <td>{'5 Stars': 360307, '4 Stars': 201622, '3 Star...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4671.The_Great_Gatsby</td>\n",
       "      <td>4671</td>\n",
       "      <td>The Great Gatsby</td>\n",
       "      <td>0684801523</td>\n",
       "      <td>9780684801520</td>\n",
       "      <td>1925</td>\n",
       "      <td>F. Scott Fitzgerald</td>\n",
       "      <td>200</td>\n",
       "      <td>['Classics', 'Fiction', 'Academic &gt; School', '...</td>\n",
       "      <td>{'to-read': 1065506, 'classics': 43826, 'curre...</td>\n",
       "      <td>{'Books': 34, 'Best': 111, '100': 26, '1001': ...</td>\n",
       "      <td>3765257</td>\n",
       "      <td>67035</td>\n",
       "      <td>3.92</td>\n",
       "      <td>{'5 Stars': 1341705, '4 Stars': 1263237, '3 St...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>38447.The_Handmaid_s_Tale</td>\n",
       "      <td>38447</td>\n",
       "      <td>The Handmaid's Tale</td>\n",
       "      <td>isbn not found</td>\n",
       "      <td>isbn13 not found</td>\n",
       "      <td>1985</td>\n",
       "      <td>Margaret Atwood</td>\n",
       "      <td>314</td>\n",
       "      <td>['Fiction', 'Classics', 'Science Fiction &gt; Dys...</td>\n",
       "      <td>{'to-read': 847109, 'currently-reading': 13201...</td>\n",
       "      <td>{'Best': 71, 'Books': 3429, '100': 71, 'The': ...</td>\n",
       "      <td>1453397</td>\n",
       "      <td>70541</td>\n",
       "      <td>4.11</td>\n",
       "      <td>{'5 Stars': 614053, '4 Stars': 515080, '3 Star...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1885.Pride_and_Prejudice</td>\n",
       "      <td>1885</td>\n",
       "      <td>Pride and Prejudice</td>\n",
       "      <td>0553213105</td>\n",
       "      <td>9780553213102</td>\n",
       "      <td>1813</td>\n",
       "      <td>Jane Austen</td>\n",
       "      <td>279</td>\n",
       "      <td>['Classics', 'Fiction', 'Romance', 'Historical...</td>\n",
       "      <td>{'to-read': 1227780, 'currently-reading': 1358...</td>\n",
       "      <td>{'Best': 82, 'Books': 229, 'All': 285, '100': ...</td>\n",
       "      <td>2988238</td>\n",
       "      <td>67087</td>\n",
       "      <td>4.26</td>\n",
       "      <td>{'5 Stars': 1611736, '4 Stars': 814196, '3 Sta...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>18423.The_Left_Hand_of_Darkness</td>\n",
       "      <td>18423</td>\n",
       "      <td>The Left Hand of Darkness</td>\n",
       "      <td>0441478123</td>\n",
       "      <td>9780441478125</td>\n",
       "      <td>1969</td>\n",
       "      <td>Ursula K. Le Guin</td>\n",
       "      <td>304</td>\n",
       "      <td>['Science Fiction', 'Fiction', 'Fantasy', 'Cla...</td>\n",
       "      <td>{'to-read': 116814, 'currently-reading': 8036,...</td>\n",
       "      <td>{'Best': 6, 'Science': 3, 'Favorite': 41, '100...</td>\n",
       "      <td>119642</td>\n",
       "      <td>9131</td>\n",
       "      <td>4.07</td>\n",
       "      <td>{'5 Stars': 46515, '4 Stars': 44532, '3 Stars'...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>14942.Mrs_Dalloway</td>\n",
       "      <td>14942</td>\n",
       "      <td>Mrs. Dalloway</td>\n",
       "      <td>0156628708</td>\n",
       "      <td>9780156628709</td>\n",
       "      <td>1925</td>\n",
       "      <td>Virginia Woolf</td>\n",
       "      <td>194</td>\n",
       "      <td>['Classics', 'Fiction', 'Literature', 'Novels'...</td>\n",
       "      <td>{'to-read': 197554, 'currently-reading': 14288...</td>\n",
       "      <td>{'Best': 5, 'The': 19, 'Books': 5, '1001': 528...</td>\n",
       "      <td>222692</td>\n",
       "      <td>9912</td>\n",
       "      <td>3.79</td>\n",
       "      <td>{'5 Stars': 69055, '4 Stars': 73711, '3 Stars'...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>36529.Narrative_of_the_Life_of_Frederick_Douglass</td>\n",
       "      <td>36529</td>\n",
       "      <td>Narrative of the Life of Frederick Douglass</td>\n",
       "      <td>0486284999</td>\n",
       "      <td>9780486284996</td>\n",
       "      <td>1845</td>\n",
       "      <td>Frederick Douglass</td>\n",
       "      <td>158</td>\n",
       "      <td>['Nonfiction', 'History', 'Classics', 'Biograp...</td>\n",
       "      <td>{'to-read': 89227, 'currently-reading': 4205, ...</td>\n",
       "      <td>{'Books': 299, 'Best': 76, '100': 10275, 'Blac...</td>\n",
       "      <td>93986</td>\n",
       "      <td>4377</td>\n",
       "      <td>4.04</td>\n",
       "      <td>{'5 Stars': 38536, '4 Stars': 30871, '3 Stars'...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>12296.The_Scarlet_Letter</td>\n",
       "      <td>12296</td>\n",
       "      <td>The Scarlet Letter</td>\n",
       "      <td>0679783385</td>\n",
       "      <td>9780679783381</td>\n",
       "      <td>1850</td>\n",
       "      <td>Nathaniel Hawthorne</td>\n",
       "      <td>279</td>\n",
       "      <td>['Classics', 'Fiction', 'Historical &gt; Historic...</td>\n",
       "      <td>{'to-read': 275850, 'classics': 19898, 'curren...</td>\n",
       "      <td>{'Books': 50, 'Best': 193, '100': 53, 'The': 1...</td>\n",
       "      <td>704758</td>\n",
       "      <td>14822</td>\n",
       "      <td>3.41</td>\n",
       "      <td>{'5 Stars': 130486, '4 Stars': 209958, '3 Star...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>2657.To_Kill_a_Mockingbird</td>\n",
       "      <td>2657</td>\n",
       "      <td>To Kill a Mockingbird</td>\n",
       "      <td>1439550417</td>\n",
       "      <td>9781439550410</td>\n",
       "      <td>1960</td>\n",
       "      <td>Harper Lee</td>\n",
       "      <td>324</td>\n",
       "      <td>['Classics', 'Fiction', 'Historical &gt; Historic...</td>\n",
       "      <td>{'to-read': 13346, 'currently-reading': 60856,...</td>\n",
       "      <td>{'Books': 338, 'Best': 53, '100': 162, '1001':...</td>\n",
       "      <td>4487646</td>\n",
       "      <td>91126</td>\n",
       "      <td>4.28</td>\n",
       "      <td>{'5 Stars': 2356364, '4 Stars': 1329450, '3 St...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>1371.The_Iliad</td>\n",
       "      <td>1371</td>\n",
       "      <td>The Iliad</td>\n",
       "      <td>0471377589</td>\n",
       "      <td>9780471377580</td>\n",
       "      <td>750</td>\n",
       "      <td>Homer</td>\n",
       "      <td>683</td>\n",
       "      <td>['Classics', 'Poetry', 'Fiction', 'Fantasy &gt; M...</td>\n",
       "      <td>{'to-read': 1224, 'currently-reading': 16098, ...</td>\n",
       "      <td>{'Books': 28, 'The': 2, 'Big': 7332, 'Best': 2...</td>\n",
       "      <td>351102</td>\n",
       "      <td>7209</td>\n",
       "      <td>3.87</td>\n",
       "      <td>{'5 Stars': 117275, '4 Stars': 114644, '3 Star...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>10210.Jane_Eyre</td>\n",
       "      <td>10210</td>\n",
       "      <td>Jane Eyre</td>\n",
       "      <td>1551111802</td>\n",
       "      <td>9781551111803</td>\n",
       "      <td>1847</td>\n",
       "      <td>Charlotte Brontë</td>\n",
       "      <td>532</td>\n",
       "      <td>['Classics', 'Fiction', 'Romance', 'Historical...</td>\n",
       "      <td>{'to-read': 726189, 'currently-reading': 52444...</td>\n",
       "      <td>{'Best': 193, 'Books': 131, '100': 1522, 'Big'...</td>\n",
       "      <td>1620275</td>\n",
       "      <td>42152</td>\n",
       "      <td>4.13</td>\n",
       "      <td>{'5 Stars': 743818, '4 Stars': 500212, '3 Star...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>11337.The_Bluest_Eye</td>\n",
       "      <td>11337</td>\n",
       "      <td>The Bluest Eye</td>\n",
       "      <td>0307278441</td>\n",
       "      <td>9780307278449</td>\n",
       "      <td>1970</td>\n",
       "      <td>Toni Morrison</td>\n",
       "      <td>216</td>\n",
       "      <td>['Fiction', 'Classics', 'Historical &gt; Historic...</td>\n",
       "      <td>{'to-read': 122475, 'currently-reading': 7043,...</td>\n",
       "      <td>{'Best': 2, 'Books': 40, \"Oprah's\": 9, 'Most':...</td>\n",
       "      <td>172557</td>\n",
       "      <td>9089</td>\n",
       "      <td>4.06</td>\n",
       "      <td>{'5 Stars': 65916, '4 Stars': 64896, '3 Stars'...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>5107.The_Catcher_in_the_Rye</td>\n",
       "      <td>5107</td>\n",
       "      <td>The Catcher in the Rye</td>\n",
       "      <td>0316769487</td>\n",
       "      <td>9780316769488</td>\n",
       "      <td>1951</td>\n",
       "      <td>J.D. Salinger</td>\n",
       "      <td>277</td>\n",
       "      <td>['Classics', 'Fiction', 'Young Adult', 'Litera...</td>\n",
       "      <td>{'to-read': 906715, 'classics': 31330, 'curren...</td>\n",
       "      <td>{'Books': 7365, 'Best': 1193, '100': 1952, '10...</td>\n",
       "      <td>2729479</td>\n",
       "      <td>57355</td>\n",
       "      <td>3.81</td>\n",
       "      <td>{'5 Stars': 934635, '4 Stars': 843907, '3 Star...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>6185.Wuthering_Heights</td>\n",
       "      <td>6185</td>\n",
       "      <td>Wuthering Heights</td>\n",
       "      <td>0553212583</td>\n",
       "      <td>9780553212587</td>\n",
       "      <td>1847</td>\n",
       "      <td>Emily Brontë</td>\n",
       "      <td>464</td>\n",
       "      <td>['Classics', 'Fiction', 'Romance', 'Gothic', '...</td>\n",
       "      <td>{'to-read': 660990, 'currently-reading': 53160...</td>\n",
       "      <td>{'Best': 1324, 'Books': 19, 'What': 169, '100'...</td>\n",
       "      <td>1338683</td>\n",
       "      <td>37119</td>\n",
       "      <td>3.86</td>\n",
       "      <td>{'5 Stars': 481611, '4 Stars': 411836, '3 Star...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>1934.Little_Women</td>\n",
       "      <td>1934</td>\n",
       "      <td>Little Women</td>\n",
       "      <td>0517214628</td>\n",
       "      <td>9780517214626</td>\n",
       "      <td>1868</td>\n",
       "      <td>Louisa May Alcott</td>\n",
       "      <td>449</td>\n",
       "      <td>['Classics', 'Fiction', 'Historical &gt; Historic...</td>\n",
       "      <td>{'to-read': 670541, 'currently-reading': 69094...</td>\n",
       "      <td>{'Books': 122, 'Best': 42, '100': 1522, 'Favor...</td>\n",
       "      <td>1677066</td>\n",
       "      <td>30874</td>\n",
       "      <td>4.09</td>\n",
       "      <td>{'5 Stars': 719003, '4 Stars': 539909, '3 Star...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                        book_id_title  book_id  \\\n",
       "0                   320.One_Hundred_Years_of_Solitude      320   \n",
       "1                               4671.The_Great_Gatsby     4671   \n",
       "2                           38447.The_Handmaid_s_Tale    38447   \n",
       "3                            1885.Pride_and_Prejudice     1885   \n",
       "4                     18423.The_Left_Hand_of_Darkness    18423   \n",
       "5                                  14942.Mrs_Dalloway    14942   \n",
       "6   36529.Narrative_of_the_Life_of_Frederick_Douglass    36529   \n",
       "7                            12296.The_Scarlet_Letter    12296   \n",
       "8                          2657.To_Kill_a_Mockingbird     2657   \n",
       "9                                      1371.The_Iliad     1371   \n",
       "10                                    10210.Jane_Eyre    10210   \n",
       "11                               11337.The_Bluest_Eye    11337   \n",
       "12                        5107.The_Catcher_in_the_Rye     5107   \n",
       "13                             6185.Wuthering_Heights     6185   \n",
       "14                                  1934.Little_Women     1934   \n",
       "\n",
       "                                     book_title            isbn  \\\n",
       "0                 One Hundred Years of Solitude      8420471836   \n",
       "1                              The Great Gatsby      0684801523   \n",
       "2                           The Handmaid's Tale  isbn not found   \n",
       "3                           Pride and Prejudice      0553213105   \n",
       "4                     The Left Hand of Darkness      0441478123   \n",
       "5                                 Mrs. Dalloway      0156628708   \n",
       "6   Narrative of the Life of Frederick Douglass      0486284999   \n",
       "7                            The Scarlet Letter      0679783385   \n",
       "8                         To Kill a Mockingbird      1439550417   \n",
       "9                                     The Iliad      0471377589   \n",
       "10                                    Jane Eyre      1551111802   \n",
       "11                               The Bluest Eye      0307278441   \n",
       "12                       The Catcher in the Rye      0316769487   \n",
       "13                            Wuthering Heights      0553212583   \n",
       "14                                 Little Women      0517214628   \n",
       "\n",
       "              isbn13  year_first_published                  author  num_pages  \\\n",
       "0      9788420471839                  1967  Gabriel García Márquez        417   \n",
       "1      9780684801520                  1925     F. Scott Fitzgerald        200   \n",
       "2   isbn13 not found                  1985         Margaret Atwood        314   \n",
       "3      9780553213102                  1813             Jane Austen        279   \n",
       "4      9780441478125                  1969       Ursula K. Le Guin        304   \n",
       "5      9780156628709                  1925          Virginia Woolf        194   \n",
       "6      9780486284996                  1845      Frederick Douglass        158   \n",
       "7      9780679783381                  1850     Nathaniel Hawthorne        279   \n",
       "8      9781439550410                  1960              Harper Lee        324   \n",
       "9      9780471377580                   750                   Homer        683   \n",
       "10     9781551111803                  1847        Charlotte Brontë        532   \n",
       "11     9780307278449                  1970           Toni Morrison        216   \n",
       "12     9780316769488                  1951           J.D. Salinger        277   \n",
       "13     9780553212587                  1847            Emily Brontë        464   \n",
       "14     9780517214626                  1868       Louisa May Alcott        449   \n",
       "\n",
       "                                               genres  \\\n",
       "0   ['Fiction', 'Classics', 'Magical Realism', 'Li...   \n",
       "1   ['Classics', 'Fiction', 'Academic > School', '...   \n",
       "2   ['Fiction', 'Classics', 'Science Fiction > Dys...   \n",
       "3   ['Classics', 'Fiction', 'Romance', 'Historical...   \n",
       "4   ['Science Fiction', 'Fiction', 'Fantasy', 'Cla...   \n",
       "5   ['Classics', 'Fiction', 'Literature', 'Novels'...   \n",
       "6   ['Nonfiction', 'History', 'Classics', 'Biograp...   \n",
       "7   ['Classics', 'Fiction', 'Historical > Historic...   \n",
       "8   ['Classics', 'Fiction', 'Historical > Historic...   \n",
       "9   ['Classics', 'Poetry', 'Fiction', 'Fantasy > M...   \n",
       "10  ['Classics', 'Fiction', 'Romance', 'Historical...   \n",
       "11  ['Fiction', 'Classics', 'Historical > Historic...   \n",
       "12  ['Classics', 'Fiction', 'Young Adult', 'Litera...   \n",
       "13  ['Classics', 'Fiction', 'Romance', 'Gothic', '...   \n",
       "14  ['Classics', 'Fiction', 'Historical > Historic...   \n",
       "\n",
       "                                              shelves  \\\n",
       "0   {'to-read': 660690, 'currently-reading': 34629...   \n",
       "1   {'to-read': 1065506, 'classics': 43826, 'curre...   \n",
       "2   {'to-read': 847109, 'currently-reading': 13201...   \n",
       "3   {'to-read': 1227780, 'currently-reading': 1358...   \n",
       "4   {'to-read': 116814, 'currently-reading': 8036,...   \n",
       "5   {'to-read': 197554, 'currently-reading': 14288...   \n",
       "6   {'to-read': 89227, 'currently-reading': 4205, ...   \n",
       "7   {'to-read': 275850, 'classics': 19898, 'curren...   \n",
       "8   {'to-read': 13346, 'currently-reading': 60856,...   \n",
       "9   {'to-read': 1224, 'currently-reading': 16098, ...   \n",
       "10  {'to-read': 726189, 'currently-reading': 52444...   \n",
       "11  {'to-read': 122475, 'currently-reading': 7043,...   \n",
       "12  {'to-read': 906715, 'classics': 31330, 'curren...   \n",
       "13  {'to-read': 660990, 'currently-reading': 53160...   \n",
       "14  {'to-read': 670541, 'currently-reading': 69094...   \n",
       "\n",
       "                                                lists  num_ratings  \\\n",
       "0   {'Best': 80020, 'Books': 7, 'Favorite': 184, '...       751336   \n",
       "1   {'Books': 34, 'Best': 111, '100': 26, '1001': ...      3765257   \n",
       "2   {'Best': 71, 'Books': 3429, '100': 71, 'The': ...      1453397   \n",
       "3   {'Best': 82, 'Books': 229, 'All': 285, '100': ...      2988238   \n",
       "4   {'Best': 6, 'Science': 3, 'Favorite': 41, '100...       119642   \n",
       "5   {'Best': 5, 'The': 19, 'Books': 5, '1001': 528...       222692   \n",
       "6   {'Books': 299, 'Best': 76, '100': 10275, 'Blac...        93986   \n",
       "7   {'Books': 50, 'Best': 193, '100': 53, 'The': 1...       704758   \n",
       "8   {'Books': 338, 'Best': 53, '100': 162, '1001':...      4487646   \n",
       "9   {'Books': 28, 'The': 2, 'Big': 7332, 'Best': 2...       351102   \n",
       "10  {'Best': 193, 'Books': 131, '100': 1522, 'Big'...      1620275   \n",
       "11  {'Best': 2, 'Books': 40, \"Oprah's\": 9, 'Most':...       172557   \n",
       "12  {'Books': 7365, 'Best': 1193, '100': 1952, '10...      2729479   \n",
       "13  {'Best': 1324, 'Books': 19, 'What': 169, '100'...      1338683   \n",
       "14  {'Books': 122, 'Best': 42, '100': 1522, 'Favor...      1677066   \n",
       "\n",
       "    num_reviews  average_rating  \\\n",
       "0         30615            4.08   \n",
       "1         67035            3.92   \n",
       "2         70541            4.11   \n",
       "3         67087            4.26   \n",
       "4          9131            4.07   \n",
       "5          9912            3.79   \n",
       "6          4377            4.04   \n",
       "7         14822            3.41   \n",
       "8         91126            4.28   \n",
       "9          7209            3.87   \n",
       "10        42152            4.13   \n",
       "11         9089            4.06   \n",
       "12        57355            3.81   \n",
       "13        37119            3.86   \n",
       "14        30874            4.09   \n",
       "\n",
       "                                  rating_distribution  \n",
       "0   {'5 Stars': 360307, '4 Stars': 201622, '3 Star...  \n",
       "1   {'5 Stars': 1341705, '4 Stars': 1263237, '3 St...  \n",
       "2   {'5 Stars': 614053, '4 Stars': 515080, '3 Star...  \n",
       "3   {'5 Stars': 1611736, '4 Stars': 814196, '3 Sta...  \n",
       "4   {'5 Stars': 46515, '4 Stars': 44532, '3 Stars'...  \n",
       "5   {'5 Stars': 69055, '4 Stars': 73711, '3 Stars'...  \n",
       "6   {'5 Stars': 38536, '4 Stars': 30871, '3 Stars'...  \n",
       "7   {'5 Stars': 130486, '4 Stars': 209958, '3 Star...  \n",
       "8   {'5 Stars': 2356364, '4 Stars': 1329450, '3 St...  \n",
       "9   {'5 Stars': 117275, '4 Stars': 114644, '3 Star...  \n",
       "10  {'5 Stars': 743818, '4 Stars': 500212, '3 Star...  \n",
       "11  {'5 Stars': 65916, '4 Stars': 64896, '3 Stars'...  \n",
       "12  {'5 Stars': 934635, '4 Stars': 843907, '3 Star...  \n",
       "13  {'5 Stars': 481611, '4 Stars': 411836, '3 Star...  \n",
       "14  {'5 Stars': 719003, '4 Stars': 539909, '3 Star...  "
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "metadata = pd.read_csv(\"classic_book_metadata/all_books.csv\")\n",
    "metadata"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's plot the total number of Goodreads ratings for each book"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<matplotlib.axes._subplots.AxesSubplot at 0x116ade3d0>"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAkUAAAEFCAYAAADtzpMwAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAgAElEQVR4nOzde5yWVb3//9dbVA6CZGpGtHPU8MRBiFGT1DDd7u0uzyIVbsNKv+ZW08IiNUPblaVFmYqiIdo2NS1LxdTE8yFlOINm/tSxRCu1wPAMfn5/rHXj5e0999z3HBgG3s/HYx5zXeta11rruhmYxVrrWh9FBGZmZmbruvW6ugFmZmZmawJ3iszMzMxwp8jMzMwMcKfIzMzMDHCnyMzMzAyA9bu6AWbWdptttlk0NDR0dTPMzLqV2bNnvxARm5enu1Nk1o01NDTQ1NTU1c0wM+tWJD1dKd3TZ2ZmZma4U2RmZmYGePrMCiRtCszMp+8HVgLP5/NdIuKNFu6bDtwUEddJem8u4zzgVuC8iDhM0mhgQkR8StJ4oDEijm9DGxuAp4D/jYhv5rTNgOeAi+stM5d3U0QMKUtvBI6MiBPrbWO+/y7S8zaVpR8A7BgRZ9dQxvKI6Fstz8Ily2iYOKMtTTQz67aaz/5kp5TrTpGtEhEvAsMBJE0ClkfEubXeL6k/qSM0NSIuy8mHdXQ7gSeBTwHfzOdjgMUdWUHuzHT4Yp2IuAG4oTxd0voRsaKj6zMzs9p5+syqkrS3pLmSFkqaJqlnC1n7Ar8DfhERU/K9DZIW1VHXLpIeyPU9IGm7FrK+CjyaR3MAxgK/LJSzpaSZkhbk7x/K6VtIul7S/Pw1qqz+rXPdO0saLemmnD4pP/tdkp6UdGLhnm9K+qOk30u6StKEQpFH5OdYJGmXnH+8pPPz8XRJP5J0J/B9SVtJelDSLEnfrvVzMzOzjuFOkVXTC5gOjI2IoaSRxS+1kPdHwH0RMbkd9f0R2DMiRgBnAN+tkvdq4NOSPkia5nu2cO184IqIGAZcSZrKI3+/OyJ2Aj5CYXQpd8B+BRwVEbMq1Lc98B/ALsC3JG2QO2WHAiOAQ4DGsns2iohRwHHAtBaeY1tgn4j4KvATYEpE7Az8tcqzm5lZJ3CnyKrpATwVEX/K55cDe7aQ9w7gQEnva0d9/YFr8+jSZGBwlby3AP8OfAa4puzabsAv8vHPgd3z8SeAKQARsTIiluX0zYHfAkdExLwW6psREa9HxAvA34Etcrm/jYhXI+JfwI1l91yV67oH2FjSeyqUe21ErMzHHyvdk9tdkaRjJDVJalr5yrKWspmZWZ3cKbJqXq4j79WkDsfNkvq1sb5vA3fmRc/7k0aqKsqLvmcDXyWN8FQTrVxfBvyF1ClpyeuF45WkUTPVWW+ldpR/xq21lYiYGhGNEdHYo0//1rKbmVmN3CmyanoBDZI+nM//G7i7pcwR8WPSm2fXS9qwDfX1B5bk4/E15P8h8PW8QLzoAeDT+XgccF8+nkme/pPUQ9LGOf0N4CDgSEmfraO99wH7S+olqS9Q/jrE2FzX7sCywshUS+4va7eZma1GfvvMqnkNOIo0pbU+MAu4qNoNEfF1SZeRpn++UWd9PwAul/QV0nRcVRGxmMpvnZ0ITJN0CmlLgaNy+peBqZK+QBrt+RLpVX4i4mVJnwJ+L+ll0uhRa/XPknQDMB94mvS2WvG+f0p6ANgY+Hxr5eX2/ULSl2l99AuAoQP709RJr6aama1rFNHqaL2ZtUBS34hYLqkPcA9wTETMWV31NzY2hsN8mJnVR9LsiCh/OcYjRWbtNFXSjqSpxstXZ4fIzMw6ljtFZu0QEfWsQTIzszWYF1qbmZmZ4U6RmZmZGeBOkZmZmRngNUVm3drCJctomDijq5thZt1AZ0WWX5t4pMjaTdKmkublr79KWpKPl0p6pJ1l/6ekh3PQ1XmSrikFeO2Adh+U3xxr6foROajs4hxA9tIWQnUU7xkv6QOt5Jku6bC2ttvMzDqHO0XWbhHxYkQMj4jhpM0dJ+fj4cBbbS1X0hDgp8DnImL7XOaVQEOFvG0Z9TwIqNgpkvSfwMnAfhExmBRA9gFSzLNqxgNVO0VmZrZmcqfIOlsPSZfk0ZbbJPUGkLSNpFskzZZ0r6TtK9z7deC7EfFoKSEibsgBVpF0l6TvSrob+LKkkZLuzmXeKmlAzne0pFl5tOdXkvpIGgUcAJyTR6C2Kav7NGBCRCzJ9a6MiGkR8Vgu84xc5iJJU5UcBjQCV+Yye0s6W9IjecTp3EL5++Tn/lPeSZt8PryUQdL9koa158M3M7PauVNknW0QcEEebVkKHJrTpwInRMRIYAJwYYV7BwOtbYb4noj4OHAeaVTpsFzmNOA7Oc+vI2LniNgJeBT4QkQ8ANwAnJJHuZ6os+7zc5lDgN7ApyLiOlKoj3F5VKs3cDAwOCKGAf9buL8B+DgpXtpFknoBl5JjvknaFugZEQvKK5Z0jKQmSU0rX2k1GomZmdXInSLrbE9FxLx8PJsUYLYvMIoUU20ecDEwoFohhXVLf5I0oXDpmvx9O2AIKXbZPOB04IP52pA8CrOQFGh1cD0PIGlorvsJSWNz8l6SHsplfqKFMl8ixY+7VNIhwCuFa7+MiLci4nHgSWB74FrgU5I2IMVKm16pPRExNSIaI6KxR5/+9TyKmZlV4bfPrLO9XjheSRo9WQ9YmkdTqllMWsszPyJeBIbnDlHfQp6X83cBiyNitwrlTAcOioj5ksYDo2tod6nuOyNiYa77fKB3HtW5EGiMiL9ImkQK8/EOEbFC0i7A3sCngeNJHSiA8qCDERGvSPo9cCBwOGkqzszMVhOPFNlqFxEvAU9JGgOQ1+PsVCHrD4DTJO1QSOvTQrGPAZtL2i2XuYGk0uhNP+C5PAIzrnDPv/K1Sr4HnCvpg4W03vl7qQP0Qh71Kr5JtqrMfK1/RNwMnERaeF4yRtJ6eS3T1rn9kKbQzgNmRcQ/WmibmZl1Ao8UWVcZB0yRdDqwAXA1ML+YISIWSvoycIWkfsCLwJ+Bb5UXFhFv5IXO50nqT/rZ/jFpxOebwEPA08BC3u4IXQ1cIulE0lqkJwrl3Sxpc+B3knqQ1kMtAm6NiKWSLsllNQOzCk2ZTloj9CqwH/DbPLIk0ttsJY8Bd5PeZjs2Il7L9c6W9BJwWS0f4tCB/Wny3iNmZh1CEeWj+GbWVfIeR3cB20dEq9sZNDY2RlNTU6e3y8xsbSJpdkS8a4mCp8/M1hCSjiSNaJ1WS4fIzMw6lqfPzNYQEXEFcEVXt8PMbF3lkSIzMzMz3CkyMzMzA9wpMjMzMwO8pshqIGkl6fXz9UlhMj4XEa9UyPdARIyqkD4duCmHweh0eTPF5RFxboX0o4HnSc9yakTcUEe5ZwH3RMTtddzTTNrk8YWWPp/2WLhkGQ0TZ3RkkWbrrGZvb7HO80iR1eLVHB9sCPAGcGzxYt7Hh47+hd9JJuedtMcA0yS94++ApBb/oxARZ9TTIapwf3f4fMzM1lnuFFm97gU+LGm0pDsl/YI0ioSk5fm7JJ2fo8PPAN5XurmlSPZFkvbPccXmSrpd0hY5fZKkaZLukvRk3nSxdM9pkh6TdDspDlpVEfEosALYLJf3XUl3A19uqY2SpucNIpHULGmzfNwo6a58vKmk23LbLyZt2lhq4/LC8dckLZQ0X9LZOW0bSbfkeu+VtH1NfyJmZtYhPH1mNcujKPsBt+SkXYAhEfFUWdaDSR2ToaQdmx8hjcpsQIpkf2BEPJ+Dq36HFPy06D7goxERkr4IfA34ar62PbAXaVfqxyRNAYaRYouNIP1MzyEFn632LLsCb5Gm0gDeExEfz228u4Y2tuRbwH0RcZakTwLHVKh7P+AgYNcc7+y9+dJU0u7Wj+f2XcjbsdLMzKyTuVNkteitFHke0kjRz0hR7h+u0CEC2BO4KiJWAs9KuiOnFyPZA/QAnqtw/weBa/IIzYZAsY4ZEfE68Lqkv5M6XXsA15fWOUmqtk7oZElHkGKUjc0dL4Br6mxjS/YEDgGIiBmS/lkhzz7AZaX2RsQ/cpy0UcC1uV6AnpUqkHQMubPVY+PN62iamZlV406R1eLV8oj2+Rf3y5WzA++OAg/VI9kX/RT4UUTcIGk0MKlw7fXC8Ure/hmuNV7N5PIF2FnpWWpt4wrenn7uVXattbaoQp71gKXln3MlETGVNKpEzwGDHKfHzKyDeE2RdYZ7gE9L6pFHe/bK6dUi2Rf1B5bk48/VWN/BknrnwLH7t6PttbaxGRiZjw8ta8u4fO9+wCYV7r0N+LykPjnfeyPiJeApSWNymiTt1I7nMDOzOnmkyDrD9aS1MAuBP5HW6LQWyb5oEmkaaQnwB2CrapVFxBxJ1wDzgKdJU3xtUkMbSyMzZwI/k3QqKV4ZhfSrJM0hPfefK9Rxi6ThQJOkN4CbgVNJnakpkk4HNgCuBuZXa+/Qgf1p8mvEZmYdQhEefTerhaQbSdN6d3Z1W0oaGxujqampq5thZtatSJodEY3l6Z4+M6uBpGlAH9KbcWZmthby9JlZDSKi1lfyzcysm/JIkZmZmRnuFJmZmZkB7hSZmZmZAV5TZNatLVyyjIaJM7q6GWZdytHtraN4pGgtloOTzstff5W0JB8vlfRIO8veT1KTpEcl/VFSpV2ii/lHS6oaJV5Sg6RF7WlXWXmNks6r855TO6o9ksZLej5/5n+UdHJbyzIzs87nTtFaLCJejIjhOXTERaQQF8OB4aRgqG0iaQhwPnBEROxAihX2ZCu3jSbF9uo0OWDtKhHRFBEn1lnMqa1nqcs1+TP/GHCapH/r4PLNzKyDuFO07uoh6RJJiyXdJqk3gKRtJN0iabakeyVtX+HerwHfiYg/AkTEioi4MN+/v6SHJM2VdLukLSQ1AMeSgrHOk7RHTr9e0vz8Veow1dUuSdMl/UjSncD3i43Mo1M35eNJkqZJukvSk5Le1VmSdDY5+K2kKzvgc1olIl4E/j9ggKRvS/pyod7vlNoj6RRJsyQtkHRmtTLNzKxjuVO07hoEXBARg4GlvB2/aypwQkSMBCYAF1a4dwgwu4Vy7wM+GhEjSGEqvhYRzRRGqiLiXuA84O6I2An4CG+H0WhLu7YF9omIr7byzNsD/wHsAnxL0gbFixExkRz8NiLGtaM97yLpQ6TAsQuAn5FjuklaD/g0cKWkfXN9u5BG80ZK2rNCWcfkqcumla8sa+WRzcysVl5ove56KiLm5ePZQIOkvqQprmsllfL1rLPcDwLXKAWC3RB4qoV8nwCOBIiIlcAySZu0sV3X5jJaMyMiXgdel/R3YAvgmVbuae/nNFbSXsB2wNER8RrQLOlFSSNyG+ZGxIu5U7QvMDff25fUSbqnWGBETCV1yug5YJDj9JiZdRB3itZdrxeOVwK9SSOHS/MamGoWkyLEVwpW+lNSfLAbJI0mBXft7Ha93Maya/n5b8/nBGlN0fGSdgNmSPpdRPwVuBQYD7wfmJbzCvheRFxcQ7lmZtbBPH1mq0TES8BTksYAKNmpQtZzgFMlbZvzrSfpK/laf2BJPv5c4Z5/Af0K5zOBL+X7e0jauAPa1RHeLJ9W64j2RMSDwM+B0lqi64H/BHYGbs1ptwKfzyNRSBoo6X1tfhIzM6uLR4qs3DhgiqTTgQ1I64LeMSIUEQsknQRcJakPEEBps5xJpGmlJcAfgK1y+o3AdZIOBE4gdQ6mSvoCaQTmS8Bz7WlXB5kKLJA0Bzitg9vzfWCOpO9GxL/y4vClpam/iLhN0g7Ag3labjlwBPD3lgocOrA/Td6jxcysQyjCSxLMVre8wHoOMCYiHm9rOY2NjdHU1NRxDTMzWwdImh0RjeXpnj4zW80k7Uh6PX9mezpEZmbWsTx9ZraaRcQjwNZd3Q4zM3snjxSZmZmZ4U6RmZmZGeBOkZmZmRngNUW2hpO0PCL6lqUdC7wSEVdIGg/cFhHP5msnAVMj4pV83gw0RsQLNdR1IHBURByUz78BfCEiPpzP9yftSn1Ahz1gOy1csoyGiTNaz2jdTrO3WjBb7TxSZN1ORFwUEVfk0/HABwqXTwL6tLHoB4DdCue7AS8VNlAcBdzfxrLNzGwN506RdTs54v0ESYcBjaRgqvNy5PkPAHfmjRHL7ztC0sM578WSehSvR8TzpBhsH85JA4FfkTpD5O8PSNpS0swcyX5mDvaKpOmSpki6U9KTkj4uaZqkRyVNL7RjX0kPSpoj6drCDtbNks7M6Qslbd+hH5yZmVXlTpF1WxFxHdAEjMuR7X8CPAvsFRF7FfPmnaLHAh/LMctWknalLvcAMErSdsDjpF25R0laHxgGzALOB66IiGHAlcB5hfs3IQW7PZm0i/dkYDAwVNJwSZsBpwP7RMRHcvu/Urj/hZw+BZjQxo/GzMzawGuKbF2xNymI7awcQqM3lcNn3E8aEeoBPAg8DJwBjAAei4jXcnDXQ3L+nwM/KNx/Y0SEpIXA3yJiIYCkxUAD8EFgR+D+3I4Ncz0lv87fZxfqeAdJxwDHAPTYePPant7MzFrlTpGtKwRcHhHfaCXfA6TYbD2AS3KMsl7AaFpeT1SMlfN6/v5W4bh0vj5phOr3EfGZFsoq3bOSFv5+RsRUUow2eg4Y5Dg9ZmYdxNNn1t39C+hX5bxkJnBYadG0pPdK2rJCvkdI65L2AObmtHnAsaQOE/n7p/PxOOC+Otr7B+BjpXVLkvpI2raO+83MrJN4pMjWdH0kPVM4/1HZ9enARZJeJb0tNhX4naTniuuKIuKRHNH+thyM9U3gf4Cni4Xlqa+HgP4R8WZOfpA0XVXqFJ0ITJN0CvA8cFStDxMRz+dtBK6S1DMnnw78qdYyioYO7E+TX902M+sQivDou1l31djYGE1NTV3dDDOzbkXS7IhoLE/39JmZmZkZ7hSZmZmZAe4UmZmZmQHuFJmZmZkB7hSZmZmZAX4l36xbW7hkGQ0TZ3R1M6wDNHtrBbMu55EiW6dIWt6JZY+X9HwOOFv62rGz6jMzs47lkSKzjnVNRBxf702S1o+IFZ3RIDMzq41HimydI6mvpJmS5khaKOnAnN4g6VFJl0haLOk2Sb3ztW0k3SJptqR7JW1fR30/L9WRz6+UdEAeWbpW0o2knbY3kjRN0ixJc4v3mJlZ53OnyNZFrwEHR8RHgL2AHyqHrAcGARdExGBgKXBoTp8KnBARI4EJwIUtlD22bPqsN3ApORSIpP7AKODmnH834HMR8QngNOCOiNg5t+scSRuVVyDpGElNkppWvrKsPZ+DmZkVePrM1kUCvitpT1L0+oHAFvnaUxExLx/PBhok9SV1ZK59u+9ETyqrNH12t6QLcjDaQ4BfRcSKXNbvI+IfOd++wAGSJuTzXsCHgEeLhUXEVFInjZ4DBjlOj5lZB3GnyNZF44DNgZER8aakZlIHBOD1Qr6VQG/SiOrSiBjejjp/nuv9NPD5QvrLhWMBh0bEY+2ox8zM2sjTZ7Yu6g/8PXeI9gK2rJY5Il4CnpI0BkDJTnXWOR04KZe3uIU8twInlKbyJI2osw4zM2sHjxTZOkPS+qSRoCuBGyU1AfOAP9Zw+zhgiqTTgQ2Aq4H5FfKNlbR74fy4iHggIv4m6VHgN1Xq+DbwY2BB7hg1A5+q1qihA/vT5P1tzMw6hCK8JMHWDXl055KI2KUL6u4DLAQ+EhEdtjq6sbExmpqaOqo4M7N1gqTZEdFYnu7pM1snSDoWuAo4vQvq3oc0GvXTjuwQmZlZx6p7+kzSRhHxcus5zdYcEXERcFEX1X076S0yMzNbg9U8UiRplKRHyK8HS9pJUkt7tZiZmZl1K/VMn00G/gN4ESAi5gN7dkajzMzMzFa3utYURcRfypJWdmBbzMzMzLpMPWuK/iJpFBCSNgROpGynXeueJG0KzMyn7yd1dp8HGoBnI6JNkd4ljQcaizs8S7oLmBAR7X5lKm+62BgRL7SznEbgyIg4sVodkpojoqGG8q4HtgL6kjaJfCpfOi4iHijLO5r0eVR99b4lC5cso2HijLbcap2s2VslmHU79XSKjgV+QgqJ8AxwG/A/ndEoW70i4kVgOICkScDyiDhXUgNwU9e1bPXIHbQOe689Ig6G9nd4zMxs9ap5+iwiXoiIcRGxRUS8LyKOyL9Mbe3Wo6OjxpdImpIDmy6WdGYhvVnSmYUo9tvn9E1zG+ZKupgUFqMU3f6Pki6VtChHod9H0v2SHpe0S863i6QH8v0PSNoup4+WdFO1OrLnc54Bku7JAV8XSdqjhmetWHdZno0kTZM0K+c7sN7P1MzM2q7VkSJJPwVa3OGx0pSDrVUGAZ+JiKMl/ZIUNf7/SAFJj42IxyXtSooa/4kK95fv8PzhwvFpEfEPST2AmZKGRcSCfO2FiPiIpONIUem/CHwLuC8izpL0SeCYsnLH5LRZwGeB3YEDgFOBg0h7Be2Zg7HuA3w3P09Ri3Xk6PXksm+NiO/ktvep9gFmtdR9GnBHRHxe0nuAhyXd7i0wzMxWj1qmz7xd7rqtQ6PG5zVFJYdLOob0czgA2BEodYp+XajzkHy8Z+k4ImZI+mdZOxfmOhYDMyMiJC0krY2CFPPsckmDSB39DSq0t1odJbOAaZI2AH5T+HyqqaXufYEDJE3I571I+xu9Y+1e/syOAeix8eY1VG1mZrVotVMUEZcDSBoTEdcWr5UCZNparVOixkvaijQCtHNE/FPSdN6OVF+sdyXv/DltadSy2M63CudvFe7/NnBnRByc10vd1UJZVWPfRMQ9kvYEPgn8XNI5EXFFtXtqrFvAoRHxWCv1TyWN1NFzwCDH6TEz6yD1vJL/jRrTbC3XQVHjNwZeBpZJ2gLYr4Z77iEFZkXSfsAmddbZH1iSj8e3tQ5JWwJ/j4hLgJ8BH+mgum8FTlAefpM0ooZyzcysg9Sypmg/4L+AgZLOK1zaGFjRWQ2zNV6tUeMrioj5kuYCi4EngftruO1M4CpJc4C7gT/X2eYfkKawvgLc0Y46RgOnSHoTWA4c2UF1fxv4MbAgd4yagapvrg0d2J8mv/ptZtYhFFF99D2PAAwHzgLOKFz6F2k6oNKaCzNbDRobG6Opycv+zMzqIWl2RDSWp9eypmg+MF/SlRHhkSEzMzNbK9UyffbLiDgcmCvpXcNKETGsU1pmZmZmthrV8kr+l/N378prZmZma61W3z6LiOfy4XER8XTxCziuc5tnZmZmtnrU80r+v1dIq+U1ajMzM7M1Xi1rir5EGhHaWtKCwqV+1PYatZl1koVLltEwcUZXN2Od1OytEMzWOrWMFP0C2B+4IX8vfY2MiCNKmSTVu5GedTOSJks6qXB+q6RLC+c/zPvwtHR/g6TPFs7HSzq/nW06K8cSa7cciHazwvmqQLFV7mks27+rUp4GSYtauDZe0gfa1mIzM+tItawpWhYRzRHxmbI1Rf8oyzqzk9poa44HSDHPkLQesBkwuHB9FNVHDxtIwVQ7hKQeEXFGRNzeUWXWKyKa2hkUeTzgTpGZ2RqgnjVFrVHrWaybu5/cKSJ1hhYB/5K0iaSewA6krRumSzqsdJOk5fnwbGAPSfMknZzTPiDpFkmPS/pB4Z59JT0oaY6ka3MQ2tJozhmS7gPGFOvK187M9yyUtH1O31zS73P6xZKeLo4I1ULSRpKmSZolaa6kA3P6qtGkVurpIekSSYsl3Sapd253I3Bl/kx6Szpb0iOSFkg6t542mplZ+3Rkp8iBKddyEfEssELSh0idoweBh4DdSL/cF0TEG1WKmAjcGxHDI2JyThsOjAWGAmMl/VvuSJwO7BMRHwGagOK03GsRsXtEXF2hjhfyPVNIAWcBvgXckdOvJ0Web8mduYMyD7i0kH5aLmNnYC/gHEkbld1brZ5BwAURMRhYSgr8el1+tnE5uG5v4GBgcN7/638rNVDSMZKaJDWtfGVZlUcxM7N61LJPkVlRabRoFPAjYGA+XkaaXqvXzIhYBiDpEWBL4D3AjsD9OTbqhqQOWMk1Vcr7df4+GzgkH+9O6mwQEbdIqhaaZq+IeCG3ZzRvd6z2BQ6QVDrvxbs7V9XqeSoi5hXa1lCh7peA14BLJc0AKq5nioipwFSAngMG+T8jZmYdpCM7RZ4+WzeU1hUNJU2f/QX4KukX+rScZwV5FDIHNt2wSnmvF45Xkn4mBfw+Ij7Twj0v11BeqSzomJ9NkUZ3HntHorRFWZ7W2lVqW+/yDBGxQtIuwN7Ap4HjgU+0ucVmZlaXmqfPJH2hQtrZhdO9O6RFtqa7n7S7+T8iYmVecP8e0hRaaTSnGRiZjw8ENsjH/yJt5dCaPwAfk/RhAEl9JG3bjjbfBxyey9oXaMubkrcCJ+ROHpJGdFA9qz6TvG6qf0TcDJxEmlo0M7PVpJ6RosMkvRYRVwJIuhDoWbpY4W00WzstJL119ouytL6laSfgEuC3kh4mvZVYGtlZQFqTNB+YDlScxoqI5yWNB67KC7ghrTH6UxvbfGYuayxwN/AcqTNSj28DPwYW5I5RM+8OfdNSPX2rlDsduEjSq6TNUH8rqRdp1OnkKvcBMHRgf5q8X46ZWYdQRG1LEiT1Ju1VNI30j/c/IuKk6neZdb3csVqZp6d2A6bkhc3dsp6ixsbGaGpq6swqzMzWOpJmR0RjeXotO1q/t3D6ReA3pCmUsyS91yNE1g18CPhl3lvpDeDobl6PmZl1glqmz2aTXrdX4fsn81cAW3da68w6QEQ8DlRaA9Qt6zEzs87RaqcoIrZaHQ0xMzMz60o1L7SWtAHwJWDPnHQXcHFEvNkJ7TIzMzNbrep5+2wK6dXqC/P5f+e0L3Z0o8zMzMxWt3o6RTtHxE6F8zvyq9UGSPogcAFpJ+b1SLsRn9JK2Itay54O3JTDQpTSlkdEtVe96ym/GWgsvFJf7/2TgOURcW4hbV/SK+qjIiIk9SCtTzsuItqy83W7SToH+C/g5og4pZC+BfAz4N9IHf/miPivKuU0kP48hkhqBI6MiBPzDthv1Pt8lf58a7VwyTIaJs6o97Y1WrO3GDCzLlJP7LOVkrYpnUjamrQz7zov71vza+A3ETEI2Ja0N813urRh7aCkzbHxIuI24NMKfOIAACAASURBVGmgtOnnCcCs9nSIJLV3B/b/B3yk2CHKziLtoL1TROxIitFWk4hoiogT8+lo3g6Ya2Zm3Uw9v/ROIQXLvEvS3cAdpPAOlkIxvBYRlwFExErSxnufz7sxj5f0a9URDb5WxSjt+fz8vPFhtajxm+ZI7XMlXUwOTyGpQdKjeWPOOcC/STpFKTL8AklnFuo5TdJjkm4HtmuheScD35A0mBSy4ustPa9S5PtZkhZJmlrYOfouSd/NP3NfljQm55kv6Z4Kn4cknZPzLMwbKSLpBmAj4KFSWsEA4JnSSUQsqFZWpc8/jx4dC5ysFFB2D0nTJR1WyLu8UO75kh5RinH2vkKekZLuljRb0q2SBrTw2ZqZWQeruVMUETNJkb5PzF/bRcSdndWwbmYwaWpolYh4Cfgz8OGc1JZo8EXn5F+2pQjutWopavx9ETGCtCFnMbDpdsAV+dp2pD/zXXL7R0raU9JIUmyuEaSgqztXqjginiPtAv0gKeL7elWe9/yI2DkihpDighV3i35PRHw8In4InAH8R57KPaBCtYfktu4E7EP63AZExAHAqxExPCLKA8peAPxM0p25s/eBamW18KzNwEXA5FzHvZXyZQeTPtuhpL2MRsGqlxl+ChwWESNJG6V229FGM7Pupt63z/4fhbfPJPnts6S0h1O19LZEgy86pXxNUY1tqxQ1fs/ScUTM0DujuT8dEX/Ix/vmr7n5vC+pk9QPuD4iXsltuaFK/RcAZ0fEdEmfouXn3UvS14A+wHuBxcCN+VqxE3M/MF3SLwvPVrQ7cFUerftbHmHamdT5qygiblWaDv5P0m7tcyUNqVLWgirPW4s9C+U+K+mOnL4dMAT4ff58epBChbyDpGOAYwB6bLx5O5tiZmYlfvusYywGDi0mSNqYtHD3CVJw1LZEg6/Fqoj0Wa+y65WixkPlThy8MwK9gO9FxMXFDJJOqnL/O0TEW5JKeSs+r1KsrwtJi73/orRwu/gcq9oUEcdK2pW0eeg8ScMj4sWyNtct78z+C+AXeTpyz7aWVbDqzyZPB25YrLJCfgGLI2K3Vto6FZgK0HPAoNri9JiZWavqWVO0c0R8LiLuyF9H0cK0yTpoJtBH0pEASm9a/RCYXhpNaUFHRIN/GthRUk9J/YG9a7jnHmBcrnM/Wo7mfitpXVRp3c9ASe/L9x8sqbekfsD+Nba1pectdYBeyHUd1lIBkraJiIci4gzgBVLHs/zZxkrqIWlzUufm4WqNkvQJSX3ycT9gG9LUZ71lrYp4nzWTOsQAB5L+U1Fq46dzuQOAvXL6Y8DmSnHTkLSB0nosMzNbDeoZKVqZfyE9AX77rCi/cn4wcKGkb5I6mzcDp7ZyX7ujweeRlV+SpnQe5+2prmpK0dznkKK5/7mFsm+TtAPwYJ7OWQ4cERFzJF0DzCN1yqqtnymWV/F5I+JPki4BFpI6ErOqFHOOpEGkUZWZQPm2ENcDu+X0AL4WEX9tpWkjgfMllUZ2Lo2IWZKaKpWVF1VXciNwnaQDSW/bXUKKev9wbmtpxOt60uL8haQ/67sBIuKNvDD7vNzBXZ+0JmtxSw0fOrA/TX6F3cysQyiittF3SXsDlwFP5qQG4CgvtjbrOo2NjdHU1NTVzTAz61YkzY6IxvL0eqbP7gcuBt7KXxfT8qJgMzMzs26lnumzK4CXgG/n888APwfGdHSjzMzMzFa3ejpF25WF+bhTDvNhZmZma4l6ps/mSvpo6SS/Fn1/xzfJzMzMbPVrdaRI0kLSmzcbAEdK+nM+3xJ4pHObZ2ZmZrZ61DJ99qnWs5hZV1i4ZBkNE2d0dTPardnbCpjZGqDV6bOIeLra1+popK25lILLlmKy/VXSkny8NIczaWu5WygFWp2fA6fe3EHtHS/p/FbyjJZUMdp9LfeX5T+1cPweScfV3lozM1ud6llTZPYuEfFiDoA6nEJAVFIg1bfaUfRZpJAgO0XEjsDE9rZVUq0vFowmB2ntAMUNPN8D1NUpUuK/p2Zmq4H/sbXO1EPSJZIWS7pNUm9IoTok3SJptqR7JW1f4d4BwDOlk4hYFYRV0tckLcyjSGfntKMlzcppvyqE7Zgu6UeS7gS+X6xA0uY576z89bG8W/WxwMl5xGuPWh5U0hGSHs73XJxDeJwN9M5pVwJnA9vk83PyfafkuhdIOjOnNUh6VNKFwBzeHcrEzMw6gTtF1pkGARdExGBgKW8HzZ0KnBARI4EJvB1kuOgC4GeS7pR0mqQPwKpYbQcBu+YtIn6Q8/86InbOaY8CXyiUtS2wT0R8tayOn5BGtnbObbs0IpopjHhFRKshTHIolLHAx/Io2UpgXERMBF7N5YwjjXY9kc9PkbRv/ox2IY2sjZS0Zy52O+CKiBhRPk0t6RhJTZKaVr6yrLXmmZlZjerZp8isXk9FxLx8PBtoyAFfRwHX5nhqAD3Lb4yIW3N8vf8E9iNtCTEE2Ae4rBRoN0e3Bxgi6X9JU1R9ScFsS66NiEpx+vYhBdMtnW+cA8LWa29S/LRZuazewN9ruG/f/FWKV9eX1En6M/B0RPyh0k0RMZXUsaTngEG1xekxM7NWuVNknen1wvFKUmdhPWBpHlGpKnd4fgH8QtJNpCj1Im0JUW46cFBEzM9BZ0cXrr1cIT+5LbtFxKvFxEInqVYCLo+Ib7Thvu9FxMVl9TfQcpvNzKyTePrMVquIeAl4StIYWLWQeKfyfJI+UVgX1A/YhjSCchvw+cK19+Zb+gHPSdoAGFdjc24Dji/UWeqo/SuXV6uZwGGS3ldqk6Qt87U3c5sqlXtrfpa++b6BpTLMzGz180iRdYVxwBRJp5M2Bb0aKA8ZMxI4X9IKUuf90oiYBas6L02S3gBuJr3h9U3gIeBpYCG1dWpOBC6QtID0d+Ee0iLrG4HrJB1IWvtUvq5ovKSDCucfBU4Hbstvir0J/E9uy1RggaQ5ETFO0v2SFgG/y+uKdgAezKNTy4EjSKNqNRk6sD9N3uPHzKxDKMJLEsy6q8bGxmhqaurqZpiZdSuSZkdEY3m6p8/MzMzMcKfIzMzMDHCnyMzMzAxwp8jMzMwMcKfIzMzMDPAr+baOkrQpaX8hgPeTXoN/HmgAns1BaNtS7nigMSKOlzQJWB4R50o6C7gnIm6vsZwG4KaIGFIt38Ily2iYOKMtTa1Zs1/5N7N1hDtFtk6KiBdJ8cYo67w0ADd1Qn1ndHSZZmbWsTx9ZvZuPSRdImmxpNsk9QaQtI2kWyTNlnSvpO1rLVDSdEmH5eMzJM2StEjSVOWdGyWNlDRf0oOkzR/NzGw1cqfI7N0GARdExGBgKXBoTp9K2uF6JDABuLCN5Z8fETvnqbHewKdy+mXAiRGxW9ubbmZmbeXpM7N3eyoi5uXj2UBDjk82Cri2EDC2ZxvL30vS14A+wHuBxZLuAd4TEXfnPD8H9qt0s6RjgGMAemy8eRubYGZm5dwpMnu31wvHK0mjOesBSyNieOVbaiOpF2mEqTEi/pLXM/UCBNQUcycippJGreg5YJDj9JiZdRBPn5nVICJeAp6SNAZAyU5tKKpX/v5CHn06LJe/FFgmafd8fVx722xmZvXxSJFZ7cYBUySdDmwAXA3Mr6eAiFgq6RJgIdAMzCpcPgqYJukV4NZayhs6sD9NfmXezKxDKMKj72bdVWNjYzQ1NXV1M8zMuhVJsyOisTzd02dmZmZmuFNkZmZmBrhTZGZmZga4U2RmZmYGuFNkZmZmBviVfLNubeGSZTRMnNHm+5v9Or+Z2SoeKbI1iqSQ9PPC+fqSnpfUIZHrc2DWp3Lg1T9JukLSwBruu0tSYz5ulrRZR7THzMzWHO4U2ZrmZWBIKTI98O/AkkoZJbV1pPOUiNgJ2A6YC9wpacM2lmVmZmsJd4psTfQ7oDSv8xngqtIFSZMkTZV0G3CFpMGSHpY0T9ICSYNqrSSSycBfycFXJU2R1CRpsaQzWytD0lckLcpfJ+W0r0k6MR9PlnRHPt5b0v+1VE++fn2h7H+X9Otan8fMzNrHnSJbE10NfDoHTx0GPFR2fSRwYER8FjgW+EkO1NoIPNOG+uYA2+fj0/Iup8OAj0sa1tJNkkaSQnPsCnwUOFrSCOAeYI+crRHoK2kDYHfg3ir13AHsIGnznOco4LIK9R6TO1RNK19Z1obHNTOzStwpsjVORCwAGkijRDdXyHJDRLyajx8ETpX0dWDLQno9VDg+XNIc0rTaYGDHKvftDlwfES9HxHLg16TO0GxgpKR+wOu5jY35WqlT9K56IsXc+TlwhKT3ALuRRs3eISKmRkRjRDT26NO/DY9rZmaVuFNka6obgHMpTJ0VvFw6iIhfAAcArwK3SvpEG+oaATwqaStgArB3RAwDZvB2VPtKVCkxIt4kBXs9CniA1BHaC9imhnouA44gdQivjYgVbXgeMzNrA3eKbE01DTgrIhZWyyRpa+DJiDiP1JFqcbqrwr3Ka38GALcAG5M6XMskbUFeZ1TFPcBBkvpI2gg4mLdHgu4hdXzuyWnHAvPyaFCL9UTEs8CzwOnA9FqfxczM2s/7FNkaKSKeAX5SQ9axpOmmN0kLps8CkHQz8MXcySh3jqRvAn2APwB7RcQbwHxJc4HFwJPA/a20cY6k6cDDOenSiJibj+8FTgMejIiXJb2W04iI1uq5Etg8Ih5p7eGHDuxPk/caMjPrEEr/cTWzNYWk84G5EfGz1vI2NjZGU1PTamiVmdnaQ9Ls/LLLO3ikyGwNImk2aWrtq13dFjOzdY07RWZrkIgY2dVtMDNbV3mhtZmZmRnuFJmZmZkB7hSZmZmZAV5TZF1I0qbAzHz6fmAl8DxpN+tnI6LabtLVyh0PnEMKJLsB8ChwZES8ImkSsDwizm1X499Z36kR8d0WrjUD/yI9G8A9EXFiR9W9cMkyGibOqDl/s1/fNzNrkUeKrMtExIsRMTzHLbsImJyPhwNvtbP4a3LZg4E3SPsZdZZTW7m+V+k5O7JDZGZmHcudIltT9ZB0SY4if5uk3gCStpF0i6TZku6VtH21QiStD2wE/LPCtbskNebjzfKoDpJ6SDpH0ixJCyT9v5w+QNI9kuZJWiRpD0lnA71z2pW1PFh+hjmF80H5VXwkjZR0d36+WyUNqKVMMzNrP3eKbE01CLggj/QsBQ7N6VOBE/Kr6xOAC1u4f6ykeaQptPcCN9ZR9xeAZRGxM7AzcHSOV/ZZ4NY8mrUTKWzHRODVPAo0roXy7sydpnmSTo6IJ0ghPobn60cB0yVtAPwUOCw/3zTgO3W028zM2sFrimxN9VREzMvHs4EGSX2BUcC10qpYrD1buP+aiDheKeMFwCnA2TXWvS8wTNJh+bw/qZM2C5iWOy+/KbSvNXtFxAtlaZcCR0n6CmlqbxdgO2AI8Pv8fD2A58oLk3QMcAxAj403r7EJZmbWGneKbE31euF4JdCbNLK5NI/U1CQiQtKNwAm8u1O0grdHS3sV0kUajbq1vDxJewKfBH4u6ZyIuKLWtpT5FfAt4A5gdkS8KOkDwOKI2K3ajRExlTRiRs8Bgxynx8ysg3j6zLqNiHgJeErSGFgV5X6nGm7dHXiiQnozUNpB+rBC+q3Al/KIEJK2lbSRpC2Bv0fEJcDPgI/k/G+W8tbxLK/leqYAl+Xkx4DNJe2W691A0uB6yjUzs7bzSJF1N+OAKZJOJ71ufzUwv0K+sZJ2J3X8nwHGV8hzLvBLSf9NGrEpuZS0LcCcPP32PHAQMBo4RdKbwHLgyJx/KrBA0pwW1hXdKan0Sv6CiCjddyVwCHAbQES8kafszpPUn/T388fA4pY+jKED+9Pk1+zNzDqEIjz6btYVJE0A+kfEN9taRmNjYzQ1NXVgq8zM1n6SZkdEY3m6R4rMuoCk64FtgE90dVvMzCxxp8isC0TEwV3dBjMzeycvtDYzMzPDnSIzMzMzwJ0iMzMzM8Brisy6tYVLltEwcUaL15v9ur6ZWc08UrSGkbRpIU7WXyUtycdLJT3SjnLHSzq/jvxjJD0q6U5JoyWNqrXcYqDV9pLULGmzOvKvandZeoOkVyXNzdcflvS5NrRntKSb6r3PzMzWfB4pWsNExIvAcABJk4DlEXGupAZgdf4y/gJwXETcWWoH8MBqrL+tVrW7wrUnImIEgKStgV9LWi8iLquQ910k+e+LmdXkzTff5JlnnuG1117r6qas03r16sUHP/hBNtigtqAD/ke+e+kh6RJSUNQlwIER8aqkbUhBTzcHXgGOjog/1lKgpCOAE4ENgYeA44DTSKExtpK0ANgDWJnznhAR99baYElTSJHmewPXRcS3cnozcDmwP2ln6jER8UdJmwJX5Wd5mBSHrFK5nwFOzddnRMTXJZ1RaPcNEXFKS+2KiCdzMNYfApdJ2oW0e3Rv4FXgqIh4TNJ4UqyzXsBGwFmFNuxM2s36UNLu1h8Cts7ffxwR5+V8lT5jSKFCGoEApkXEZEknAseS4rI9EhGfrv4Jm9ma6JlnnqFfv340NDRQCGBtq1FE8OKLL/LMM8+w1VZb1XSPp8+6l0HABRExGFhK+mUM6RfzCRExEpgAXFhLYZJ2IEVo/1gOsroSGBcRZwFN+XgMcBEwOSKGt9AhGluY8ptH+kVfclreNXQY8HFJwwrXXoiIj5Dif03Iad8C7ssjOjeQOhjl7f4A8H3SxofDgZ0lHVTW7hY7RAVzgO3z8R+BPXO9ZwDfLeTbDfhcRKzaaDFPJ15E6pg+mZO3B/6DFPH+Wzl2WcXPOLd7YEQMiYihvB3/bCIwIiKGkTpH7yLpGElNkppWvrKshsc0s9XttddeY9NNN3WHqAtJYtNNN61rtM4jRd3LUxExLx/PBhok9SWNHF1b+MvXs8by9iYFRJ2V7+0N/L0N7bomIo4vnUi6q3DtcEnHkH7WBgA7AgvytV/n77NJMcAA9iwdR8QMSf+sUN/OwF0R8Xyu78p832/qbHfxX6v+wOWSBpFGbopjrb+PiH8UzncgdUT3jYhnC+kzIuJ14HVJfwe2oOXP+EZga0k/BWaQ45+RPpsrJf2mpeeJiKm5fnoOGOQ4PWZrKHeIul69fwbuFHUvrxeOV5J+wa4HLM2jEPUScHlEfKMjGveuwqWtSCNAO0fEPyVNJ01DlZSeZyXv/Fls7Rd9R/1LMwJ4NB9/G7gzIg7O67fuKuR7uey+50jPMQIodorK/3zWp8pnLGkn0sjS/wCHA58nTdXtCRwAfFPS4IhY0YZnMzOzOrlT1M1FxEuSnpI0JiKuzVHdh0VEpcjx5WYCv5U0OSL+Lum9QL+IeLos37+AjdvQvI1JHYplkrYA9uOdnY1K7iFNL/2vpP2ATSrkeQj4SX4r7Z/AZ4Cf1tOw3PE5t3Bff9I6LYDxrdy+lLSg+zZJL0fEXVXyVvyMSZ/LGxHxK0lPANMlrQf8W17cfh/wWaBvrs/MurFqW2e0hbfb6BzuFK0dxgFTJJ1Omva5GqjUKRov6aDC+UeB00m/3NcD3iSNWpR3im4ErpN0IHUstI6I+ZLmAouBJ4H7a7jtTOAqSXOAu4E/Vyj3OUnfAO4kjcTcHBG/raHsbXJ7epE6ej8tvHn2A9L02VeAO1orKCL+Jml/4HeSPl8l3yP5z6X8M36VtMC7tK7vG0AP4P8k9c/PNTkiqnaIhg7sT5P/cTSzbuKuu+5iww03ZNSotMvLRRddRJ8+fTjyyCO7uGWJIrwkway7amxsjKampq5uhpmVefTRR9lhhx1Wna9LI0UrVqxg/fUrj7lMmjSJvn37MmHChIrXO0P5nwWApNn5JaB38NtnZmZma6Hm5mZ22GEHjj76aAYPHsy+++7Lq6++yujRoyn9Z+qFF16goaEBgOnTp3PQQQex//77s9VWW3H++efzox/9iBEjRvDRj36Uf/zjHy3WNXr0aE499VQ+/vGP85Of/IQbb7yRXXfdlREjRrDPPvvwt7/9jebmZi666CImT57M8OHDuffee5k0aRLnnnvuqjK+/vWvs8suu7Dtttty771pUuKVV17h8MMPZ9iwYYwdO5Zdd92VpqYmVq5cyfjx4xkyZAhDhw5l8uTJ7f7MPH1mZma2lnr88ce56qqruOSSSzj88MP51a9+VTX/okWLmDt3Lq+99hof/vCH+f73v8/cuXM5+eSTueKKKzjppJNavHfp0qXcfffdAPzzn//kD3/4A5K49NJL+cEPfsAPf/hDjj322HeMFM2cOfMdZaxYsYKHH36Ym2++mTPPPJPbb7+dCy+8kE022YQFCxawaNEihg9P7xXNmzePJUuWsGjRolX1t5c7RWZmZmuprbbaalUnYuTIkTQ3N1fNv9dee9GvXz/69etH//792X///QEYOnQoCxYsqHrv2LFjVx0/88wzjB07lueee4433nij5s0TDznkkHe19b777uPLX/4yAEOGDGHYsLTd3dZbb82TTz7JCSecwCc/+Un23XffmuqoxtNnZmZma6mePd/etq5Hjx6r1vu89dZbAO/a2LCYf7311lt1vt5667FiRfXdQTbaaKNVxyeccALHH388Cxcu5OKLL655A8VSfaW2QtqZupJNNtmE+fPnM3r0aC644AK++MUv1lRHNR4pMjMz62Rr0sLohoYGZs+ezS677MJ1113XKXUsW7aMgQMHAnD55ZevSu/Xrx8vvfRSXWXtvvvu/PKXv2SvvfbikUceYeHChUBaD7Xhhhty6KGHss022zB+/Ph2t7tTR4okhaQfFs4n5OCinVFXg6TPFs4bJZ3XGXUV6rhK0gJJJ5elHyRpx8J5u6LGS1peIe1YSUfm4+1ziI25OQ5am0naQ9LiXF7vsmsri+E88l4/baljkqS6Xj2QdICkiVWuN0ha1EoZDZJezZ/To5IelvS5etpRK0nTJR3WGWUXLVyyjIaJMyp+mZlVMmHCBKZMmcKoUaN44YUXOqWOSZMmMWbMGPbYYw8222yzVen7778/119//aqF1rU47rjjeP755xk2bBjf//73GTZsGP3792fJkiWMHj2a4cOHM378eL73ve+1u92d+kq+pNdIu//uHBEv5F+EfSNiUo33r1/czbf8vCzvaGBCRHyq/S2vqW3vBx6KiC0rXJsO3BQR1+Xzu3Lb2vTutKTlEdG3yvWJQO9SsNX2kHQR6bneFTm+WjvyppGKiLdqqGMSsDwizq2xTS3+uRfyNJA+8yG15pG0NSnUyE8qPW97lP8MdJaeAwbFgM/9uOK1Nel/pmbrmkqvgVvbrFy5kjfffJNevXrxxBNPsPfee/OnP/2JDTfcsKb716RX8leQYjSdXH5B0v6SHsr/a78973hcGkWYKuk24IoK5w2S7pU0J3+NykWeDezx/7d378FSl3Ucx98fLnlkwBui43gwTAEzFUw0L2SK5Jha6hiNeUtGMw0TzcvYNJpmM1oaw6SAoqmRN3RQQCuBBkolKeQucskRHI81gUdREe9+++N51rMcds9l9Jzdxc9rZoezv8v+vr9nV/e7z/P8ft/cg3GppKMkPS6pi6S1knYoOvYLknaV1EfSFEnz8+OIEnHWSbpb0rIc69F51Uxgl3y8rxdtfzipRMNNeV2h52ZE7plYXdheUldJN+VjL5X0o7Y2bKG3RdLxwCXAeZLm5HVn5mMtlnS7pK4l9j8mn88ySXdJ2kbSeaRyE9co1RNrLYZ+ucdlPKm4al9JVxSdz3VF2/5c0ipJfwUGFi3fS9ITkhbk93WfvPweSWPyOf1a0jmSbs3rdpX0qKQl+XF4s7i+lM/t4Jbiz4Vcf0qqYI+knSRNzbHPUy5eq2Y9W5KeywkWkq6WtFLSLKWewy16wCRdk9vkufxZVl5+saTn8/EezMu+oaaeuEWSerX2PpiZbc02bdrE0KFDGTRoEKeccgoTJkxoc0LUXp0xp2gcsFTSb5otfxo4NCIifxlfCVyW1x0EDI2Id5R6FYqf9wC+GRHvKhXvfIBUlf0qinqKlHqOiIiPJU0DTiHdQfhrwNp8R+L7SXcNflrSHsAMUrHPYqPy6+yfv7BnShpASnweb15zLCL+IWk6m/cUAXSLiENyEvMLYDipVMQbEXGwpG2AuZJmRsSatjZuRPxZqXdnY0TcrM2rsn+QE5YzgEmFfSTVAfcAx0TEakmTgAsjYqykoZTv4dhWUqEg7RpSsjsQGBkRP5Z0LNCfVCVewHRJR5JKWpxGqhXWjZRALcivMxG4ICL+nd+b8UChGv0AYHhEfCTpnKI4fgf8Pdcp60oqhbFjPreBpDt6jywqntuShaTq9pDupr0oIk6WNCy3WdmackpDoqeWOa9it0bEL/M+fwROJN0l/Cpgz4h4T01J++XAqIiYq1Tst+3lnc3MOtioUaOYO3fz4gSjR49m5MiRHXbMXr160Vk3qe3wpCjX5ppE+jX+TtGqemCypN2AL5C+ZAumR8Q7ZZ53B26VNJhUdHNAG8KYDFwD3E36cp6clw8H9lVTFd3tJPWKiLeK9h1Kro8VESslvZSP2b6ZYptXhO+X/z4WOEBNc0+2JyUVbU6KSihXlb3YQGBNRKzOz/9ASv5Kj8M0eac4Ccy9JS9FxLy86Nj8WJSf9ySdTy/g0YjYlPebnv/tCRwOPFz0HjRd+gAPR8RHJeIYBpwNkNe/IWlHoA8wDTg1Ipa3ci6fnEbR30NJSQ4RMVtSb6WSG+UMBaYVPpuSHiuz3dGSrgR6ADuRyp48BiwF7pM0FZiat50LjMk9dY9ERMMWAUvnA+cDdN2uT9vO0sw6XUS0u0p7tRs3blylQ2iX9k4R6qyrz8aSfkUXz9u4BRgTEdNzr861ReuaVyUvfn4p8D9gEGn4ry2/pJ8B9pbUBzgZ+FVe3gU4rFkC1txn9YkuVRFepFpiMz6jYxRes2RV9mbbfFaK3xsBN0TE7ZsdTLoESla+7wJsaN7bVua12+IN4GXgCFLi0RYHAivy36XaJUjDwMVDzXUt6hmvjAAABq5JREFUbL+Z3Cs3HhgSES/nns/C/icAR5J6Ha+W9JWIuFHSn4DjgXmShkfEys0CiphI6mFjm936u06PWRWqq6ujsbGR3r17b3WJUa2ICBobG6mrq2t946xTkqKIeE3SQ6Thorvy4uKq5O25Amh7oCEPi/2AVEQTUoHPkvMv8hDdo8AYYEVENOZVM4GLgJsAJA0uMeRSqNo+Ow+b7QGsAnZrIcaysTQzA7hQ0uw81DUAeCUi2psMFCtZlT1is8r3K4F+kvaOiBeAs0jFVz+tGcD1ku6LiI2SdicVQH2SVAX+RtJn7tvA7bkXcY2kERHxcJ5rc0BElCpm2/wcLwTG5uGzws0x3iclvTOUJoXf39KL5J6um8k9gTS919fnRP3VHONa0pAXkr4KFO5C9jRwu6Qb8nmdANzR7DCF/xpfzT1j3yUV1+0C9I2IOZKeBk4HekrqHRHLgGWSDiMN7a3EzGpKfX09DQ0NrF+/vtKhfK7V1dVRX1/f5u078z5FvyUlIAXXkoZNXgHm0fRF05rxwBRJI0hV0gsJxFLgQ0lLSPNlFjXbbzIwHzinaNnFwDhJS0lt8SRwQYnj3SZpGanH4Jw8B6SlGB8E7pB0MelLsJw7SUNpC3NCsJ70pd5cD0nFwyhjyr1gC1XZXyra5l1JI0nt343ULre1dEJtEREz85ymZ3L7bATOjIiFkiYDi3McxddhngFMyDF3J7Vda0nRaGCipHNJPW8Xkq5yJCLelnQiMEvS2xExrdm+e0laREpW3gJuKbry7FrSvLOlwCaakvUpwNl5PtV8YHU+1vw8FLgkn9ezpN6q4jbZIOkOYBmwNu8PKZm/Nw/PiTS3bYOk65Um838EPA/8paWG2H/37XnWV5mZVZ3u3bu3+S7OVj069JJ8s62dpJ65V6wHKak+PyIWdtbxhwwZEp01AdHMbGuhMpfk+47WZp/ORKUbddaR5nJ1WkJkZmafLSdFZp9CRJze+lZmZlYLPHxmVsMkvUWa+G9b2hnomBoGtc9tU57bprytqW2+GBFb3NPEPUVmtW1VqXFxA0nPum1Kc9uU57Yp7/PQNh1d5sPMzMysJjgpMjMzM8NJkVmtm1jpAKqY26Y8t015bpvytvq28URrMzMzM9xTZGZmZgY4KTIzMzMDnBSZ1SRJx0laJekFSVdVOp5qIukuSeskPVfpWKqNpL6S5khaIWm5pNGVjqlaSKqT9C9JS3LbXFfpmKqJpK6SFkl6vNKxdCQnRWY1RlJXYBzwLWBf4Pu51Igl9wDHVTqIKvUhcFlEfBk4FBjlz84n3gOGRcQgYDBwnKRDKxxTNRkNrKh0EB3NSZFZ7TkEeCEiXoyI94EHgZMqHFPViIgngdcqHUc1ioj/FurzRcRbpC+53SsbVXWIZGN+2j0/fCUSIKkeOAG4s9KxdDQnRWa1Z3fg5aLnDfiLzdpJUj/gQOCflY2keuQhosXAOmBWRLhtkrHAlcDHlQ6kozkpMqs9KrHMv2itzST1BKYAl0TEm5WOp1pExEcRMRioBw6RtF+lY6o0SScC6yJiQaVj6QxOisxqTwPQt+h5PfCfCsViNUZSd1JCdF9EPFLpeKpRRGwA/obnpgEcAXxH0lrSUP0wSfdWNqSO46TIrPbMB/pL2lPSF4DTgOkVjslqgCQBvwdWRMSYSsdTTST1kbRD/ntbYDiwsrJRVV5E/Cwi6iOiH+n/NbMj4swKh9VhnBSZ1ZiI+BC4CJhBmij7UEQsr2xU1UPSA8AzwEBJDZLOrXRMVeQI4CzSr/3F+XF8pYOqErsBcyQtJf3wmBURW/Xl57Yll/kwMzMzwz1FZmZmZoCTIjMzMzPASZGZmZkZ4KTIzMzMDHBSZGZmZjWivQWfJX1P0vO5yO/9rW7vq8/MzMysFkg6EtgITIqIFu84Lqk/8BCp0O/rknaJiHUt7eOeIjMzM6sJpQo+S9pL0hOSFkh6StI+edUPgXER8Xret8WECJwUmZmZWW2bCPwkIg4CLgfG5+UDgAGS5kqaJ6nVsi3dOjBIMzMzsw6TixsfDjycqtgAsE3+txvQHziKVCPyKUn75dp2JTkpMjMzs1rVBdgQEYNLrGsA5kXEB8AaSatISdL8ll7MzMzMrOZExJukhGcEpKLHkgbl1VOBo/PynUnDaS+29HpOiszMzKwmlCn4fAZwrqQlwHLgpLz5DKBR0vPAHOCKiGhs8fV9Sb6ZmZmZe4rMzMzMACdFZmZmZoCTIjMzMzPASZGZmZkZ4KTIzMzMDHBSZGZmZgY4KTIzMzMD4P9kLePINOI6vwAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "metadata.sort_values(by='num_ratings').plot(x='book_title', y='num_ratings', kind='barh')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Collect Goodreads Reviews (Chrome)\n",
    "## full text of review, shelves, date, user name, # likes, etc."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Make new directory for book reviews"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "!mkdir classic_book_reviews"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Run Goodreads reviews collection script"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Below we're running `get_reviews.py`, setting `--sort_order` to `default` to collect the most liked and most commented on Goodreads reviews, and indicating that the web browser we'd like to use is Chrome."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2020-10-16 15:46:36.115814 get_reviews.py: Scraping 1885.Pride_and_Prejudice...\n",
      "2020-10-16 15:46:36.116074 get_reviews.py: #1 out of 15 books\n",
      "Scraped page 1\n",
      "Scraped page 2\n",
      "Scraped page 3\n",
      "Scraped page 4\n",
      "Scraped page 5\n",
      "Scraped page 6\n",
      "Scraped page 7\n",
      "Scraped page 8\n",
      "Scraped page 9\n",
      "Scraped page 10\n",
      "ERROR: 30 duplicates found! Re-scraping this book.\n",
      "Scraped page 1\n",
      "🚨 ElementClickInterceptedException (Likely a pop-up)🚨\n",
      "🔄 Refreshing Goodreads site and skipping problem page 2🔄\n",
      "Scraped page 3\n",
      "Scraped page 4\n",
      "Scraped page 5\n",
      "Scraped page 6\n",
      "Scraped page 7\n",
      "Scraped page 8\n",
      "Scraped page 9\n",
      "Scraped page 10\n",
      "ERROR: 30 duplicates found! Re-scraping this book.\n",
      "Scraped page 1\n",
      "Scraped page 2\n",
      "Scraped page 3\n",
      "Scraped page 4\n",
      "Scraped page 5\n",
      "Scraped page 6\n",
      "Scraped page 7\n",
      "Scraped page 8\n",
      "Scraped page 9\n",
      "Scraped page 10\n",
      "ERROR: 30 duplicates found! Re-scraping this book.\n",
      "Scraped page 1\n",
      "Scraped page 2\n",
      "Scraped page 3\n",
      "Scraped page 4\n",
      "Scraped page 5\n",
      "Scraped page 6\n",
      "Scraped page 7\n",
      "Scraped page 8\n",
      "Scraped page 9\n",
      "Scraped page 10\n",
      "2020-10-16 15:49:28.676599 get_reviews.py: Scraped ✨300✨ reviews for 1885.Pride_and_Prejudice\n",
      "=============================\n",
      "2020-10-16 15:49:28.691914 get_reviews.py: Scraping 2657.To_Kill_a_Mockingbird...\n",
      "2020-10-16 15:49:28.691930 get_reviews.py: #2 out of 15 books\n",
      "Scraped page 1\n",
      "Scraped page 2\n",
      "Scraped page 3\n",
      "Scraped page 4\n",
      "Scraped page 5\n",
      "Scraped page 6\n",
      "Scraped page 7\n",
      "Scraped page 8\n",
      "Scraped page 9\n",
      "Scraped page 10\n",
      "2020-10-16 15:50:14.939548 get_reviews.py: Scraped ✨300✨ reviews for 2657.To_Kill_a_Mockingbird\n",
      "=============================\n",
      "2020-10-16 15:50:14.993548 get_reviews.py: Scraping 4671.The_Great_Gatsby...\n",
      "2020-10-16 15:50:14.993596 get_reviews.py: #3 out of 15 books\n",
      "Scraped page 1\n",
      "Scraped page 2\n",
      "Scraped page 3\n",
      "🚨 ElementClickInterceptedException (Likely a pop-up)🚨\n",
      "🔄 Refreshing Goodreads site and skipping problem page 4🔄\n",
      "🚨 ElementClickInterceptedException (Likely a pop-up)🚨\n",
      "🔄 Refreshing Goodreads site and skipping problem page 5🔄\n",
      "🚨 ElementClickInterceptedException (Likely a pop-up)🚨\n",
      "🔄 Refreshing Goodreads site and skipping problem page 6🔄\n",
      "🚨 ElementClickInterceptedException (Likely a pop-up)🚨\n",
      "🔄 Refreshing Goodreads site and skipping problem page 7🔄\n",
      "🚨 ElementClickInterceptedException (Likely a pop-up)🚨\n",
      "🔄 Refreshing Goodreads site and skipping problem page 8🔄\n",
      "🚨 ElementClickInterceptedException (Likely a pop-up)🚨\n",
      "🔄 Refreshing Goodreads site and skipping problem page 9🔄\n",
      "2020-10-16 15:51:13.294371 get_reviews.py: Scraped ✨90✨ reviews for 4671.The_Great_Gatsby\n",
      "=============================\n",
      "2020-10-16 15:51:13.300570 get_reviews.py: Scraping 10210.Jane_Eyre...\n",
      "2020-10-16 15:51:13.300592 get_reviews.py: #4 out of 15 books\n",
      "Scraped page 1\n",
      "Scraped page 2\n",
      "Scraped page 3\n",
      "Scraped page 4\n",
      "Scraped page 5\n",
      "Scraped page 6\n",
      "Scraped page 7\n",
      "Scraped page 8\n",
      "Scraped page 9\n",
      "Scraped page 10\n",
      "2020-10-16 15:51:55.747203 get_reviews.py: Scraped ✨300✨ reviews for 10210.Jane_Eyre\n",
      "=============================\n",
      "2020-10-16 15:51:55.770703 get_reviews.py: Scraping 1371.The_Iliad...\n",
      "2020-10-16 15:51:55.770720 get_reviews.py: #5 out of 15 books\n",
      "Scraped page 1\n",
      "Scraped page 2\n",
      "Scraped page 3\n",
      "Scraped page 4\n",
      "Scraped page 5\n",
      "Scraped page 6\n",
      "Scraped page 7\n",
      "Scraped page 8\n",
      "Scraped page 9\n",
      "Scraped page 10\n",
      "2020-10-16 15:52:36.237778 get_reviews.py: Scraped ✨300✨ reviews for 1371.The_Iliad\n",
      "=============================\n",
      "2020-10-16 15:52:36.253734 get_reviews.py: Scraping 6185.Wuthering_Heights...\n",
      "2020-10-16 15:52:36.253752 get_reviews.py: #6 out of 15 books\n",
      "Scraped page 1\n",
      "Scraped page 2\n",
      "Scraped page 3\n",
      "Scraped page 4\n",
      "Scraped page 5\n",
      "Scraped page 6\n",
      "Scraped page 7\n",
      "Scraped page 8\n",
      "Scraped page 9\n",
      "Scraped page 10\n",
      "2020-10-16 15:53:17.215696 get_reviews.py: Scraped ✨300✨ reviews for 6185.Wuthering_Heights\n",
      "=============================\n",
      "2020-10-16 15:53:17.233707 get_reviews.py: Scraping 5107.The_Catcher_in_the_Rye...\n",
      "2020-10-16 15:53:17.233728 get_reviews.py: #7 out of 15 books\n",
      "Scraped page 1\n",
      "Scraped page 2\n",
      "Scraped page 3\n",
      "Scraped page 4\n",
      "Scraped page 5\n",
      "Scraped page 6\n",
      "Scraped page 7\n",
      "Scraped page 8\n",
      "Scraped page 9\n",
      "Scraped page 10\n",
      "2020-10-16 15:53:58.573529 get_reviews.py: Scraped ✨299✨ reviews for 5107.The_Catcher_in_the_Rye\n",
      "=============================\n",
      "2020-10-16 15:53:58.602727 get_reviews.py: Scraping 11337.The_Bluest_Eye...\n",
      "2020-10-16 15:53:58.602753 get_reviews.py: #8 out of 15 books\n",
      "Scraped page 1\n",
      "Scraped page 2\n",
      "Scraped page 3\n",
      "Scraped page 4\n",
      "Scraped page 5\n",
      "Scraped page 6\n",
      "Scraped page 7\n",
      "Scraped page 8\n",
      "Scraped page 9\n",
      "Scraped page 10\n",
      "2020-10-16 15:54:38.357758 get_reviews.py: Scraped ✨300✨ reviews for 11337.The_Bluest_Eye\n",
      "=============================\n",
      "2020-10-16 15:54:38.376343 get_reviews.py: Scraping 320.One_Hundred_Years_of_Solitude...\n",
      "2020-10-16 15:54:38.376364 get_reviews.py: #9 out of 15 books\n",
      "Scraped page 1\n",
      "Scraped page 2\n",
      "Scraped page 3\n",
      "Scraped page 4\n",
      "Scraped page 5\n",
      "Scraped page 6\n",
      "Scraped page 7\n",
      "Scraped page 8\n",
      "Scraped page 9\n",
      "Scraped page 10\n",
      "2020-10-16 15:55:19.521779 get_reviews.py: Scraped ✨300✨ reviews for 320.One_Hundred_Years_of_Solitude\n",
      "=============================\n",
      "2020-10-16 15:55:19.536372 get_reviews.py: Scraping 36529.Narrative_of_the_Life_of_Frederick_Douglass...\n",
      "2020-10-16 15:55:19.536427 get_reviews.py: #10 out of 15 books\n",
      "Scraped page 1\n",
      "Scraped page 2\n",
      "Scraped page 3\n",
      "Scraped page 4\n",
      "Scraped page 5\n",
      "Scraped page 6\n",
      "Scraped page 7\n",
      "Scraped page 8\n",
      "Scraped page 9\n",
      "Scraped page 10\n",
      "2020-10-16 15:56:02.153638 get_reviews.py: Scraped ✨300✨ reviews for 36529.Narrative_of_the_Life_of_Frederick_Douglass\n",
      "=============================\n",
      "2020-10-16 15:56:02.170347 get_reviews.py: Scraping 1934.Little_Women...\n",
      "2020-10-16 15:56:02.170369 get_reviews.py: #11 out of 15 books\n",
      "Scraped page 1\n",
      "Scraped page 2\n",
      "Scraped page 3\n",
      "Scraped page 4\n",
      "Scraped page 5\n",
      "Scraped page 6\n",
      "Scraped page 7\n",
      "Scraped page 8\n",
      "Scraped page 9\n",
      "Scraped page 10\n",
      "2020-10-16 15:56:52.025971 get_reviews.py: Scraped ✨300✨ reviews for 1934.Little_Women\n",
      "=============================\n",
      "2020-10-16 15:56:52.053434 get_reviews.py: Scraping 12296.The_Scarlet_Letter...\n",
      "2020-10-16 15:56:52.053457 get_reviews.py: #12 out of 15 books\n",
      "Scraped page 1\n",
      "Scraped page 2\n",
      "Scraped page 3\n",
      "Scraped page 4\n",
      "Scraped page 5\n",
      "Scraped page 6\n",
      "Scraped page 7\n",
      "Scraped page 8\n",
      "Scraped page 9\n",
      "Scraped page 10\n",
      "2020-10-16 15:57:38.575974 get_reviews.py: Scraped ✨300✨ reviews for 12296.The_Scarlet_Letter\n",
      "=============================\n",
      "2020-10-16 15:57:38.592949 get_reviews.py: Scraping 18423.The_Left_Hand_of_Darkness...\n",
      "2020-10-16 15:57:38.592966 get_reviews.py: #13 out of 15 books\n",
      "Scraped page 1\n",
      "Scraped page 2\n",
      "Scraped page 3\n",
      "Scraped page 4\n",
      "Scraped page 5\n",
      "Scraped page 6\n",
      "Scraped page 7\n",
      "Scraped page 8\n",
      "Scraped page 9\n",
      "Scraped page 10\n",
      "2020-10-16 15:58:24.981084 get_reviews.py: Scraped ✨300✨ reviews for 18423.The_Left_Hand_of_Darkness\n",
      "=============================\n",
      "2020-10-16 15:58:25.003677 get_reviews.py: Scraping 14942.Mrs_Dalloway...\n",
      "2020-10-16 15:58:25.003701 get_reviews.py: #14 out of 15 books\n",
      "Scraped page 1\n",
      "Scraped page 2\n",
      "Scraped page 3\n",
      "Scraped page 4\n",
      "Scraped page 5\n",
      "Scraped page 6\n",
      "Scraped page 7\n",
      "Scraped page 8\n",
      "Scraped page 9\n",
      "Scraped page 10\n",
      "2020-10-16 15:59:09.174032 get_reviews.py: Scraped ✨300✨ reviews for 14942.Mrs_Dalloway\n",
      "=============================\n",
      "2020-10-16 15:59:09.205588 get_reviews.py: Scraping 38447.The_Handmaid_s_Tale...\n",
      "2020-10-16 15:59:09.205618 get_reviews.py: #15 out of 15 books\n",
      "Scraped page 1\n",
      "Scraped page 2\n",
      "Scraped page 3\n",
      "Scraped page 4\n",
      "Scraped page 5\n",
      "Scraped page 6\n",
      "Scraped page 7\n",
      "Scraped page 8\n",
      "Scraped page 9\n",
      "Scraped page 10\n",
      "2020-10-16 15:59:50.792094 get_reviews.py: Scraped ✨300✨ reviews for 38447.The_Handmaid_s_Tale\n",
      "=============================\n",
      "2020-10-16 15:59:53.336971 get_reviews.py:\n",
      "\n",
      "🎉 Success! All book reviews scraped. 🎉\n",
      "\n",
      "Goodreads review files have been output to /classic_book_reviews\n",
      "Goodreads scraping run time = ⏰ 0:13:20.389575 ⏰\n"
     ]
    }
   ],
   "source": [
    "!python get_reviews.py --book_ids_path goodreads_classics_sample.txt \\\n",
    "--output_directory_path classic_book_reviews --sort_order default --browser chrome "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's read in the aggregated JSON file with Pandas and see what the data looks like:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>book_id_title</th>\n",
       "      <th>book_id</th>\n",
       "      <th>book_title</th>\n",
       "      <th>review_url</th>\n",
       "      <th>review_id</th>\n",
       "      <th>date</th>\n",
       "      <th>rating</th>\n",
       "      <th>user_name</th>\n",
       "      <th>user_url</th>\n",
       "      <th>text</th>\n",
       "      <th>num_likes</th>\n",
       "      <th>sort_order</th>\n",
       "      <th>shelves</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>320.One_Hundred_Years_of_Solitude</td>\n",
       "      <td>320</td>\n",
       "      <td>One Hundred Years of Solitude</td>\n",
       "      <td>https://www.goodreads.com/review/show/14889645</td>\n",
       "      <td>14889645</td>\n",
       "      <td>2008-02-08</td>\n",
       "      <td>3</td>\n",
       "      <td>Chris</td>\n",
       "      <td>/user/show/858949-chris</td>\n",
       "      <td>Revised 28 March 2012Huh? Oh. Oh, man. Wow.I j...</td>\n",
       "      <td>1527</td>\n",
       "      <td>default</td>\n",
       "      <td>[fantasy]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>320.One_Hundred_Years_of_Solitude</td>\n",
       "      <td>320</td>\n",
       "      <td>One Hundred Years of Solitude</td>\n",
       "      <td>https://www.goodreads.com/review/show/42810714</td>\n",
       "      <td>42810714</td>\n",
       "      <td>2009-01-12</td>\n",
       "      <td>5</td>\n",
       "      <td>Meg</td>\n",
       "      <td>/user/show/1009267-meg</td>\n",
       "      <td>I guarantee that 95% of you will hate this boo...</td>\n",
       "      <td>1833</td>\n",
       "      <td>default</td>\n",
       "      <td>[]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>320.One_Hundred_Years_of_Solitude</td>\n",
       "      <td>320</td>\n",
       "      <td>One Hundred Years of Solitude</td>\n",
       "      <td>https://www.goodreads.com/review/show/11478967</td>\n",
       "      <td>11478967</td>\n",
       "      <td>2008-01-02</td>\n",
       "      <td>1</td>\n",
       "      <td>Adam</td>\n",
       "      <td>/user/show/735606-adam</td>\n",
       "      <td>So I know that I'm supposed to like this book ...</td>\n",
       "      <td>1342</td>\n",
       "      <td>default</td>\n",
       "      <td>[classics]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>320.One_Hundred_Years_of_Solitude</td>\n",
       "      <td>320</td>\n",
       "      <td>One Hundred Years of Solitude</td>\n",
       "      <td>https://www.goodreads.com/review/show/977161604</td>\n",
       "      <td>977161604</td>\n",
       "      <td>2014-06-25</td>\n",
       "      <td>5</td>\n",
       "      <td>Lisa</td>\n",
       "      <td>/user/show/32532774-lisa</td>\n",
       "      <td>\"What is your favourite book, mum?\" How many t...</td>\n",
       "      <td>592</td>\n",
       "      <td>default</td>\n",
       "      <td>[favorites, havanas-en-masse, unforgettable, n...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>320.One_Hundred_Years_of_Solitude</td>\n",
       "      <td>320</td>\n",
       "      <td>One Hundred Years of Solitude</td>\n",
       "      <td>https://www.goodreads.com/review/show/24243058</td>\n",
       "      <td>24243058</td>\n",
       "      <td>2008-06-11</td>\n",
       "      <td>1</td>\n",
       "      <td>Laura</td>\n",
       "      <td>/user/show/1040930-laura</td>\n",
       "      <td>More like A Hundred Years of Torture. I read t...</td>\n",
       "      <td>549</td>\n",
       "      <td>default</td>\n",
       "      <td>[]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4284</th>\n",
       "      <td>1934.Little_Women</td>\n",
       "      <td>1934</td>\n",
       "      <td>Little Women</td>\n",
       "      <td>https://www.goodreads.com/review/show/1689777505</td>\n",
       "      <td>1689777505</td>\n",
       "      <td>2019-12-11</td>\n",
       "      <td>3</td>\n",
       "      <td>Leah</td>\n",
       "      <td>/user/show/9846570-leah</td>\n",
       "      <td>This book is huge! It was originally published...</td>\n",
       "      <td>9</td>\n",
       "      <td>default</td>\n",
       "      <td>[]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4285</th>\n",
       "      <td>1934.Little_Women</td>\n",
       "      <td>1934</td>\n",
       "      <td>Little Women</td>\n",
       "      <td>https://www.goodreads.com/review/show/1407372705</td>\n",
       "      <td>1407372705</td>\n",
       "      <td>2015-10-03</td>\n",
       "      <td>5</td>\n",
       "      <td>Kara Swanson</td>\n",
       "      <td>/user/show/30110097-kara-swanson</td>\n",
       "      <td>What can I say? Its a classic for a reason. Th...</td>\n",
       "      <td>9</td>\n",
       "      <td>default</td>\n",
       "      <td>[magical-classics]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4286</th>\n",
       "      <td>1934.Little_Women</td>\n",
       "      <td>1934</td>\n",
       "      <td>Little Women</td>\n",
       "      <td>https://www.goodreads.com/review/show/1736223734</td>\n",
       "      <td>1736223734</td>\n",
       "      <td>2016-08-22</td>\n",
       "      <td>4</td>\n",
       "      <td>Rosemarie</td>\n",
       "      <td>/user/show/49876976-rosemarie</td>\n",
       "      <td>This charming book is deservedly a classic. Th...</td>\n",
       "      <td>9</td>\n",
       "      <td>default</td>\n",
       "      <td>[guardian-1000, rosemarie-19th-century-10]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4287</th>\n",
       "      <td>1934.Little_Women</td>\n",
       "      <td>1934</td>\n",
       "      <td>Little Women</td>\n",
       "      <td>https://www.goodreads.com/review/show/2274554034</td>\n",
       "      <td>2274554034</td>\n",
       "      <td>2018-01-27</td>\n",
       "      <td>5</td>\n",
       "      <td>Rosemary Atwell</td>\n",
       "      <td>/user/show/15847499-rosemary-atwell</td>\n",
       "      <td>A twenty-year revisitation of another of my v...</td>\n",
       "      <td>9</td>\n",
       "      <td>default</td>\n",
       "      <td>[re-reads]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4288</th>\n",
       "      <td>1934.Little_Women</td>\n",
       "      <td>1934</td>\n",
       "      <td>Little Women</td>\n",
       "      <td>https://www.goodreads.com/review/show/1617253855</td>\n",
       "      <td>1617253855</td>\n",
       "      <td>2018-12-26</td>\n",
       "      <td>4</td>\n",
       "      <td>Julia (Shakespeare and Such)</td>\n",
       "      <td>/user/show/19086853-julia-shakespeare-and-such</td>\n",
       "      <td>4.6/5 stars! Reread and I’m surprised that my ...</td>\n",
       "      <td>9</td>\n",
       "      <td>default</td>\n",
       "      <td>[classics, middle-grade]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>4289 rows × 13 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                          book_id_title  book_id  \\\n",
       "0     320.One_Hundred_Years_of_Solitude      320   \n",
       "1     320.One_Hundred_Years_of_Solitude      320   \n",
       "2     320.One_Hundred_Years_of_Solitude      320   \n",
       "3     320.One_Hundred_Years_of_Solitude      320   \n",
       "4     320.One_Hundred_Years_of_Solitude      320   \n",
       "...                                 ...      ...   \n",
       "4284                  1934.Little_Women     1934   \n",
       "4285                  1934.Little_Women     1934   \n",
       "4286                  1934.Little_Women     1934   \n",
       "4287                  1934.Little_Women     1934   \n",
       "4288                  1934.Little_Women     1934   \n",
       "\n",
       "                         book_title  \\\n",
       "0     One Hundred Years of Solitude   \n",
       "1     One Hundred Years of Solitude   \n",
       "2     One Hundred Years of Solitude   \n",
       "3     One Hundred Years of Solitude   \n",
       "4     One Hundred Years of Solitude   \n",
       "...                             ...   \n",
       "4284                   Little Women   \n",
       "4285                   Little Women   \n",
       "4286                   Little Women   \n",
       "4287                   Little Women   \n",
       "4288                   Little Women   \n",
       "\n",
       "                                            review_url   review_id       date  \\\n",
       "0       https://www.goodreads.com/review/show/14889645    14889645 2008-02-08   \n",
       "1       https://www.goodreads.com/review/show/42810714    42810714 2009-01-12   \n",
       "2       https://www.goodreads.com/review/show/11478967    11478967 2008-01-02   \n",
       "3      https://www.goodreads.com/review/show/977161604   977161604 2014-06-25   \n",
       "4       https://www.goodreads.com/review/show/24243058    24243058 2008-06-11   \n",
       "...                                                ...         ...        ...   \n",
       "4284  https://www.goodreads.com/review/show/1689777505  1689777505 2019-12-11   \n",
       "4285  https://www.goodreads.com/review/show/1407372705  1407372705 2015-10-03   \n",
       "4286  https://www.goodreads.com/review/show/1736223734  1736223734 2016-08-22   \n",
       "4287  https://www.goodreads.com/review/show/2274554034  2274554034 2018-01-27   \n",
       "4288  https://www.goodreads.com/review/show/1617253855  1617253855 2018-12-26   \n",
       "\n",
       "     rating                     user_name  \\\n",
       "0         3                         Chris   \n",
       "1         5                           Meg   \n",
       "2         1                          Adam   \n",
       "3         5                          Lisa   \n",
       "4         1                         Laura   \n",
       "...     ...                           ...   \n",
       "4284      3                          Leah   \n",
       "4285      5                  Kara Swanson   \n",
       "4286      4                     Rosemarie   \n",
       "4287      5               Rosemary Atwell   \n",
       "4288      4  Julia (Shakespeare and Such)   \n",
       "\n",
       "                                            user_url  \\\n",
       "0                            /user/show/858949-chris   \n",
       "1                             /user/show/1009267-meg   \n",
       "2                             /user/show/735606-adam   \n",
       "3                           /user/show/32532774-lisa   \n",
       "4                           /user/show/1040930-laura   \n",
       "...                                              ...   \n",
       "4284                         /user/show/9846570-leah   \n",
       "4285                /user/show/30110097-kara-swanson   \n",
       "4286                   /user/show/49876976-rosemarie   \n",
       "4287             /user/show/15847499-rosemary-atwell   \n",
       "4288  /user/show/19086853-julia-shakespeare-and-such   \n",
       "\n",
       "                                                   text  num_likes sort_order  \\\n",
       "0     Revised 28 March 2012Huh? Oh. Oh, man. Wow.I j...       1527    default   \n",
       "1     I guarantee that 95% of you will hate this boo...       1833    default   \n",
       "2     So I know that I'm supposed to like this book ...       1342    default   \n",
       "3     \"What is your favourite book, mum?\" How many t...        592    default   \n",
       "4     More like A Hundred Years of Torture. I read t...        549    default   \n",
       "...                                                 ...        ...        ...   \n",
       "4284  This book is huge! It was originally published...          9    default   \n",
       "4285  What can I say? Its a classic for a reason. Th...          9    default   \n",
       "4286  This charming book is deservedly a classic. Th...          9    default   \n",
       "4287   A twenty-year revisitation of another of my v...          9    default   \n",
       "4288  4.6/5 stars! Reread and I’m surprised that my ...          9    default   \n",
       "\n",
       "                                                shelves  \n",
       "0                                             [fantasy]  \n",
       "1                                                    []  \n",
       "2                                            [classics]  \n",
       "3     [favorites, havanas-en-masse, unforgettable, n...  \n",
       "4                                                    []  \n",
       "...                                                 ...  \n",
       "4284                                                 []  \n",
       "4285                                 [magical-classics]  \n",
       "4286         [guardian-1000, rosemarie-19th-century-10]  \n",
       "4287                                         [re-reads]  \n",
       "4288                           [classics, middle-grade]  \n",
       "\n",
       "[4289 rows x 13 columns]"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "reviews_df = pd.read_json('classic_book_reviews/all_reviews.json')\n",
    "reviews_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "How many Goodreads reviews did we collect for each book?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "To Kill a Mockingbird                          300\n",
       "The Iliad                                      300\n",
       "Wuthering Heights                              300\n",
       "One Hundred Years of Solitude                  300\n",
       "The Scarlet Letter                             300\n",
       "Narrative of the Life of Frederick Douglass    300\n",
       "The Left Hand of Darkness                      300\n",
       "The Bluest Eye                                 300\n",
       "Jane Eyre                                      300\n",
       "Mrs. Dalloway                                  300\n",
       "Little Women                                   300\n",
       "Pride and Prejudice                            300\n",
       "The Handmaid's Tale                            300\n",
       "The Catcher in the Rye                         299\n",
       "The Great Gatsby                                90\n",
       "Name: book_title, dtype: int64"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "reviews_df['book_title'].value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "How many Goodreads reviews did we collect in total?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4289"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(reviews_df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Count duplicates"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "How many duplicate reviews are in this data?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "len(reviews_df[reviews_df[['book_title','text', 'user_name', 'user_url']].duplicated(keep=False)])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's examine the duplicates (lots of spam/bots?):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>book_id_title</th>\n",
       "      <th>book_id</th>\n",
       "      <th>book_title</th>\n",
       "      <th>review_url</th>\n",
       "      <th>review_id</th>\n",
       "      <th>date</th>\n",
       "      <th>rating</th>\n",
       "      <th>user_name</th>\n",
       "      <th>user_url</th>\n",
       "      <th>text</th>\n",
       "      <th>num_likes</th>\n",
       "      <th>sort_order</th>\n",
       "      <th>shelves</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>3179</th>\n",
       "      <td>11337.The_Bluest_Eye</td>\n",
       "      <td>11337</td>\n",
       "      <td>The Bluest Eye</td>\n",
       "      <td>https://www.goodreads.com/review/show/3437328601</td>\n",
       "      <td>3437328601</td>\n",
       "      <td>2020-07-11</td>\n",
       "      <td>5</td>\n",
       "      <td>Amanda Hupe</td>\n",
       "      <td>/user/show/11658047-amanda-hupe</td>\n",
       "      <td>“Certain seeds it will not nurture, certain fr...</td>\n",
       "      <td>9</td>\n",
       "      <td>default</td>\n",
       "      <td>[2020-reads]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3246</th>\n",
       "      <td>11337.The_Bluest_Eye</td>\n",
       "      <td>11337</td>\n",
       "      <td>The Bluest Eye</td>\n",
       "      <td>https://www.goodreads.com/review/show/3420710517</td>\n",
       "      <td>3420710517</td>\n",
       "      <td>2020-07-08</td>\n",
       "      <td>5</td>\n",
       "      <td>Amanda Hupe</td>\n",
       "      <td>/user/show/11658047-amanda-hupe</td>\n",
       "      <td>“Certain seeds it will not nurture, certain fr...</td>\n",
       "      <td>5</td>\n",
       "      <td>default</td>\n",
       "      <td>[2020-reads]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             book_id_title  book_id      book_title  \\\n",
       "3179  11337.The_Bluest_Eye    11337  The Bluest Eye   \n",
       "3246  11337.The_Bluest_Eye    11337  The Bluest Eye   \n",
       "\n",
       "                                            review_url   review_id       date  \\\n",
       "3179  https://www.goodreads.com/review/show/3437328601  3437328601 2020-07-11   \n",
       "3246  https://www.goodreads.com/review/show/3420710517  3420710517 2020-07-08   \n",
       "\n",
       "     rating    user_name                         user_url  \\\n",
       "3179      5  Amanda Hupe  /user/show/11658047-amanda-hupe   \n",
       "3246      5  Amanda Hupe  /user/show/11658047-amanda-hupe   \n",
       "\n",
       "                                                   text  num_likes sort_order  \\\n",
       "3179  “Certain seeds it will not nurture, certain fr...          9    default   \n",
       "3246  “Certain seeds it will not nurture, certain fr...          5    default   \n",
       "\n",
       "           shelves  \n",
       "3179  [2020-reads]  \n",
       "3246  [2020-reads]  "
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "reviews_df[reviews_df[['book_title','text', 'user_name', 'user_url']].duplicated(keep=False)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Drop duplicates"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "reviews_df[['book_title','text', 'user_name', 'user_url']] = reviews_df[['book_title','text', 'user_name', 'user_url']].drop_duplicates(keep='first')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>book_id_title</th>\n",
       "      <th>book_id</th>\n",
       "      <th>book_title</th>\n",
       "      <th>review_url</th>\n",
       "      <th>review_id</th>\n",
       "      <th>date</th>\n",
       "      <th>rating</th>\n",
       "      <th>user_name</th>\n",
       "      <th>user_url</th>\n",
       "      <th>text</th>\n",
       "      <th>num_likes</th>\n",
       "      <th>sort_order</th>\n",
       "      <th>shelves</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Empty DataFrame\n",
       "Columns: [book_id_title, book_id, book_title, review_url, review_id, date, rating, user_name, user_url, text, num_likes, sort_order, shelves]\n",
       "Index: []"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "reviews_df[reviews_df[['book_title','text', 'user_name', 'user_url']].duplicated(keep=False)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Output to CSV file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [],
   "source": [
    "reviews_df.to_csv(\"all_goodreads_reviews.csv\", encoding ='utf-8', index=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Collect Newest Goodreads Reviews (Firefox)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Make new directory for book reviews"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Make new folder for newest Goodreads reviews"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "!mkdir classic_book_reviews_newest"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Run Goodreads reviews collection script"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Below we're running `get_reviews.py`, setting `--sort_order` to `newest` to collect the most recently published Goodreads reviews, indicating that the web browser we'd like to use is Firefox, and setting the file format of the aggregated reviews as a CSV file (in addition to a JSON file)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2020-10-16 14:26:51.613647 get_reviews.py: Scraping 4671.The_Great_Gatsby...\n",
      "2020-10-16 14:26:51.613765 get_reviews.py: #3 out of 15 books\n",
      "Scraped page 1\n",
      "Scraped page 2\n",
      "Scraped page 3\n",
      "Scraped page 4\n",
      "Scraped page 5\n",
      "Scraped page 6\n",
      "Scraped page 7\n",
      "Scraped page 8\n",
      "Scraped page 9\n",
      "Scraped page 10\n",
      "2020-10-16 14:27:52.671204 get_reviews.py: Scraped ✨300✨ reviews for 4671.The_Great_Gatsby\n",
      "=============================\n",
      "2020-10-16 14:27:52.683197 get_reviews.py: Scraping 10210.Jane_Eyre...\n",
      "2020-10-16 14:27:52.683215 get_reviews.py: #4 out of 15 books\n",
      "🚨 ElementClickInterceptedException 🚨\n",
      "Refreshing Goodreads site and rescraping book\n",
      "Scraped page 1\n",
      "Scraped page 2\n",
      "Scraped page 3\n",
      "Scraped page 4\n",
      "Scraped page 5\n",
      "Scraped page 6\n",
      "Scraped page 7\n",
      "Scraped page 8\n",
      "Scraped page 9\n",
      "Scraped page 10\n",
      "2020-10-16 14:28:59.189851 get_reviews.py: Scraped ✨300✨ reviews for 10210.Jane_Eyre\n",
      "=============================\n",
      "2020-10-16 14:28:59.212523 get_reviews.py: Scraping 1371.The_Iliad...\n",
      "2020-10-16 14:28:59.212545 get_reviews.py: #5 out of 15 books\n",
      "Scraped page 1\n",
      "Scraped page 2\n",
      "Scraped page 3\n",
      "Scraped page 4\n",
      "Scraped page 5\n",
      "Scraped page 6\n",
      "Scraped page 7\n",
      "Scraped page 8\n",
      "Scraped page 9\n",
      "Scraped page 10\n",
      "2020-10-16 14:29:55.437714 get_reviews.py: Scraped ✨299✨ reviews for 1371.The_Iliad\n",
      "=============================\n",
      "2020-10-16 14:29:55.461226 get_reviews.py: Scraping 6185.Wuthering_Heights...\n",
      "2020-10-16 14:29:55.461250 get_reviews.py: #6 out of 15 books\n",
      "Scraped page 1\n",
      "Scraped page 2\n",
      "Scraped page 3\n",
      "Scraped page 4\n",
      "Scraped page 5\n",
      "Scraped page 6\n",
      "Scraped page 7\n",
      "Scraped page 8\n",
      "Scraped page 9\n",
      "Scraped page 10\n",
      "2020-10-16 14:30:51.039187 get_reviews.py: Scraped ✨300✨ reviews for 6185.Wuthering_Heights\n",
      "=============================\n",
      "2020-10-16 14:30:51.057336 get_reviews.py: Scraping 5107.The_Catcher_in_the_Rye...\n",
      "2020-10-16 14:30:51.057376 get_reviews.py: #7 out of 15 books\n",
      "Scraped page 1\n",
      "Scraped page 2\n",
      "Scraped page 3\n",
      "Scraped page 4\n",
      "Scraped page 5\n",
      "Scraped page 6\n",
      "Scraped page 7\n",
      "Scraped page 8\n",
      "Scraped page 9\n",
      "Scraped page 10\n",
      "2020-10-16 14:31:44.113949 get_reviews.py: Scraped ✨300✨ reviews for 5107.The_Catcher_in_the_Rye\n",
      "=============================\n",
      "2020-10-16 14:31:44.130292 get_reviews.py: Scraping 11337.The_Bluest_Eye...\n",
      "2020-10-16 14:31:44.130314 get_reviews.py: #8 out of 15 books\n",
      "Scraped page 1\n",
      "Scraped page 2\n",
      "Scraped page 3\n",
      "Scraped page 4\n",
      "Scraped page 5\n",
      "Scraped page 6\n",
      "Scraped page 7\n",
      "Scraped page 8\n",
      "Scraped page 9\n",
      "Scraped page 10\n",
      "2020-10-16 14:32:38.841013 get_reviews.py: Scraped ✨300✨ reviews for 11337.The_Bluest_Eye\n",
      "=============================\n",
      "2020-10-16 14:32:38.857938 get_reviews.py: Scraping 320.One_Hundred_Years_of_Solitude...\n",
      "2020-10-16 14:32:38.857961 get_reviews.py: #9 out of 15 books\n",
      "Scraped page 1\n",
      "Scraped page 2\n",
      "Scraped page 3\n",
      "Scraped page 4\n",
      "Scraped page 5\n",
      "Scraped page 6\n",
      "Scraped page 7\n",
      "Scraped page 8\n",
      "Scraped page 9\n",
      "Scraped page 10\n",
      "2020-10-16 14:33:35.687842 get_reviews.py: Scraped ✨300✨ reviews for 320.One_Hundred_Years_of_Solitude\n",
      "=============================\n",
      "2020-10-16 14:33:35.700831 get_reviews.py: Scraping 36529.Narrative_of_the_Life_of_Frederick_Douglass...\n",
      "2020-10-16 14:33:35.700858 get_reviews.py: #10 out of 15 books\n",
      "Scraped page 1\n",
      "Scraped page 2\n",
      "Scraped page 3\n",
      "Scraped page 4\n",
      "Scraped page 5\n",
      "Scraped page 6\n",
      "Scraped page 7\n",
      "Scraped page 8\n",
      "Scraped page 9\n",
      "Scraped page 10\n",
      "2020-10-16 14:34:30.543793 get_reviews.py: Scraped ✨300✨ reviews for 36529.Narrative_of_the_Life_of_Frederick_Douglass\n",
      "=============================\n",
      "2020-10-16 14:34:30.557592 get_reviews.py: Scraping 1934.Little_Women...\n",
      "2020-10-16 14:34:30.557610 get_reviews.py: #11 out of 15 books\n",
      "Scraped page 1\n",
      "Scraped page 2\n",
      "Scraped page 3\n",
      "Scraped page 4\n",
      "Scraped page 5\n",
      "Scraped page 6\n",
      "Scraped page 7\n",
      "Scraped page 8\n",
      "Scraped page 9\n",
      "Scraped page 10\n",
      "2020-10-16 14:35:27.850384 get_reviews.py: Scraped ✨300✨ reviews for 1934.Little_Women\n",
      "=============================\n",
      "2020-10-16 14:35:27.861498 get_reviews.py: Scraping 12296.The_Scarlet_Letter...\n",
      "2020-10-16 14:35:27.861516 get_reviews.py: #12 out of 15 books\n",
      "Scraped page 1\n",
      "Scraped page 2\n",
      "Scraped page 3\n",
      "Scraped page 4\n",
      "Scraped page 5\n",
      "Scraped page 6\n",
      "Scraped page 7\n",
      "Scraped page 8\n",
      "Scraped page 9\n",
      "Scraped page 10\n",
      "2020-10-16 14:36:22.745938 get_reviews.py: Scraped ✨300✨ reviews for 12296.The_Scarlet_Letter\n",
      "=============================\n",
      "2020-10-16 14:36:22.767726 get_reviews.py: Scraping 18423.The_Left_Hand_of_Darkness...\n",
      "2020-10-16 14:36:22.767752 get_reviews.py: #13 out of 15 books\n",
      "Scraped page 1\n",
      "Scraped page 2\n",
      "Scraped page 3\n",
      "Scraped page 4\n",
      "Scraped page 5\n",
      "Scraped page 6\n",
      "Scraped page 7\n",
      "Scraped page 8\n",
      "Scraped page 9\n",
      "Scraped page 10\n",
      "2020-10-16 14:37:16.797874 get_reviews.py: Scraped ✨300✨ reviews for 18423.The_Left_Hand_of_Darkness\n",
      "=============================\n",
      "2020-10-16 14:37:16.827592 get_reviews.py: Scraping 14942.Mrs_Dalloway...\n",
      "2020-10-16 14:37:16.827619 get_reviews.py: #14 out of 15 books\n",
      "Scraped page 1\n",
      "Scraped page 2\n",
      "Scraped page 3\n",
      "Scraped page 4\n",
      "Scraped page 5\n",
      "Scraped page 6\n",
      "Scraped page 7\n",
      "Scraped page 8\n",
      "Scraped page 9\n",
      "Scraped page 10\n",
      "2020-10-16 14:38:11.189933 get_reviews.py: Scraped ✨300✨ reviews for 14942.Mrs_Dalloway\n",
      "=============================\n",
      "2020-10-16 14:38:11.207819 get_reviews.py: Scraping 38447.The_Handmaid_s_Tale...\n",
      "2020-10-16 14:38:11.207845 get_reviews.py: #15 out of 15 books\n",
      "Scraped page 1\n",
      "Scraped page 2\n",
      "Scraped page 3\n",
      "Scraped page 4\n",
      "Scraped page 5\n",
      "Scraped page 6\n",
      "Scraped page 7\n",
      "Scraped page 8\n",
      "Scraped page 9\n",
      "Scraped page 10\n",
      "2020-10-16 14:39:05.388243 get_reviews.py: Scraped ✨300✨ reviews for 38447.The_Handmaid_s_Tale\n",
      "=============================\n",
      "2020-10-16 14:39:08.130923 get_reviews.py:\n",
      "\n",
      "🎉 Success! All book reviews scraped. 🎉\n",
      "\n",
      "Goodreads review files have been output to /classic_book_reviews_newest\n",
      "Goodreads scraping run time = ⏰ 0:12:21.264886 ⏰\n"
     ]
    }
   ],
   "source": [
    "!python get_reviews.py --book_ids_path goodreads_classics_sample.txt \\\n",
    "--output_directory_path classic_book_reviews_newest --sort_order newest --browser firefox --format csv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's read in the aggregated CSV file with Pandas and see what the data looks like:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "newest_reviews = pd.read_csv('classic_book_reviews_newest/all_reviews.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>book_id_title</th>\n",
       "      <th>book_id</th>\n",
       "      <th>book_title</th>\n",
       "      <th>review_url</th>\n",
       "      <th>review_id</th>\n",
       "      <th>date</th>\n",
       "      <th>rating</th>\n",
       "      <th>user_name</th>\n",
       "      <th>user_url</th>\n",
       "      <th>text</th>\n",
       "      <th>num_likes</th>\n",
       "      <th>sort_order</th>\n",
       "      <th>shelves</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>320.One_Hundred_Years_of_Solitude</td>\n",
       "      <td>320</td>\n",
       "      <td>One Hundred Years of Solitude</td>\n",
       "      <td>https://www.goodreads.com/review/show/3572683523</td>\n",
       "      <td>3572683523</td>\n",
       "      <td>2020-10-16</td>\n",
       "      <td>4.0</td>\n",
       "      <td>Gitte Hornung</td>\n",
       "      <td>/user/show/63555571-gitte-hornung</td>\n",
       "      <td>Phew this was a long book. At one point I cons...</td>\n",
       "      <td>0</td>\n",
       "      <td>newest</td>\n",
       "      <td>[]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>320.One_Hundred_Years_of_Solitude</td>\n",
       "      <td>320</td>\n",
       "      <td>One Hundred Years of Solitude</td>\n",
       "      <td>https://www.goodreads.com/review/show/3598284436</td>\n",
       "      <td>3598284436</td>\n",
       "      <td>2020-10-16</td>\n",
       "      <td>5.0</td>\n",
       "      <td>Rob Jacobs</td>\n",
       "      <td>/user/show/114599362-rob-jacobs</td>\n",
       "      <td>Great journey!</td>\n",
       "      <td>0</td>\n",
       "      <td>newest</td>\n",
       "      <td>[]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>320.One_Hundred_Years_of_Solitude</td>\n",
       "      <td>320</td>\n",
       "      <td>One Hundred Years of Solitude</td>\n",
       "      <td>https://www.goodreads.com/review/show/3582954889</td>\n",
       "      <td>3582954889</td>\n",
       "      <td>2020-10-16</td>\n",
       "      <td>2.0</td>\n",
       "      <td>Rona Karen</td>\n",
       "      <td>/user/show/6864201-rona-karen</td>\n",
       "      <td>Two stars for now. The repetitive names annoye...</td>\n",
       "      <td>0</td>\n",
       "      <td>newest</td>\n",
       "      <td>[]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>320.One_Hundred_Years_of_Solitude</td>\n",
       "      <td>320</td>\n",
       "      <td>One Hundred Years of Solitude</td>\n",
       "      <td>https://www.goodreads.com/review/show/2122452416</td>\n",
       "      <td>2122452416</td>\n",
       "      <td>2020-10-15</td>\n",
       "      <td>4.0</td>\n",
       "      <td>Emma Pek</td>\n",
       "      <td>/user/show/53126830-emma-pek</td>\n",
       "      <td>The story was timeless and full of excitement....</td>\n",
       "      <td>0</td>\n",
       "      <td>newest</td>\n",
       "      <td>[]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>320.One_Hundred_Years_of_Solitude</td>\n",
       "      <td>320</td>\n",
       "      <td>One Hundred Years of Solitude</td>\n",
       "      <td>https://www.goodreads.com/review/show/3541262010</td>\n",
       "      <td>3541262010</td>\n",
       "      <td>2020-10-15</td>\n",
       "      <td>4.0</td>\n",
       "      <td>Irini Kallides</td>\n",
       "      <td>/user/show/113080501-irini-kallides</td>\n",
       "      <td>Amazed about the richness of his language and ...</td>\n",
       "      <td>0</td>\n",
       "      <td>newest</td>\n",
       "      <td>['book-club']</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4493</th>\n",
       "      <td>1934.Little_Women</td>\n",
       "      <td>1934</td>\n",
       "      <td>Little Women</td>\n",
       "      <td>https://www.goodreads.com/review/show/3508929611</td>\n",
       "      <td>3508929611</td>\n",
       "      <td>2020-09-14</td>\n",
       "      <td>4.0</td>\n",
       "      <td>Michael</td>\n",
       "      <td>/user/show/108824026-michael</td>\n",
       "      <td>This was a great book despite how hard it was ...</td>\n",
       "      <td>0</td>\n",
       "      <td>newest</td>\n",
       "      <td>[]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4494</th>\n",
       "      <td>1934.Little_Women</td>\n",
       "      <td>1934</td>\n",
       "      <td>Little Women</td>\n",
       "      <td>https://www.goodreads.com/review/show/3486426118</td>\n",
       "      <td>3486426118</td>\n",
       "      <td>2020-09-14</td>\n",
       "      <td>2.0</td>\n",
       "      <td>Deanna</td>\n",
       "      <td>/user/show/43177294-deanna</td>\n",
       "      <td>Wow that did not live up to the hype, but I be...</td>\n",
       "      <td>0</td>\n",
       "      <td>newest</td>\n",
       "      <td>[]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4495</th>\n",
       "      <td>1934.Little_Women</td>\n",
       "      <td>1934</td>\n",
       "      <td>Little Women</td>\n",
       "      <td>https://www.goodreads.com/review/show/3532663154</td>\n",
       "      <td>3532663154</td>\n",
       "      <td>2020-09-14</td>\n",
       "      <td>2.0</td>\n",
       "      <td>Kelly</td>\n",
       "      <td>/user/show/8545345-kelly</td>\n",
       "      <td>I listened to 'Little Women' as an audiobook a...</td>\n",
       "      <td>0</td>\n",
       "      <td>newest</td>\n",
       "      <td>[]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4496</th>\n",
       "      <td>1934.Little_Women</td>\n",
       "      <td>1934</td>\n",
       "      <td>Little Women</td>\n",
       "      <td>https://www.goodreads.com/review/show/3359606301</td>\n",
       "      <td>3359606301</td>\n",
       "      <td>2020-09-14</td>\n",
       "      <td>2.0</td>\n",
       "      <td>Ashley Brooks</td>\n",
       "      <td>/user/show/17334972-ashley-brooks</td>\n",
       "      <td>Reading this as an adult was a different exper...</td>\n",
       "      <td>0</td>\n",
       "      <td>newest</td>\n",
       "      <td>[]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4497</th>\n",
       "      <td>1934.Little_Women</td>\n",
       "      <td>1934</td>\n",
       "      <td>Little Women</td>\n",
       "      <td>https://www.goodreads.com/review/show/2459445140</td>\n",
       "      <td>2459445140</td>\n",
       "      <td>2020-09-14</td>\n",
       "      <td>4.0</td>\n",
       "      <td>Khryzette Onishi</td>\n",
       "      <td>/user/show/10999719-khryzette-onishi</td>\n",
       "      <td>Rating: 4.5 stars.A coming of age in its simpl...</td>\n",
       "      <td>0</td>\n",
       "      <td>newest</td>\n",
       "      <td>[]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>4498 rows × 13 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                          book_id_title  book_id  \\\n",
       "0     320.One_Hundred_Years_of_Solitude      320   \n",
       "1     320.One_Hundred_Years_of_Solitude      320   \n",
       "2     320.One_Hundred_Years_of_Solitude      320   \n",
       "3     320.One_Hundred_Years_of_Solitude      320   \n",
       "4     320.One_Hundred_Years_of_Solitude      320   \n",
       "...                                 ...      ...   \n",
       "4493                  1934.Little_Women     1934   \n",
       "4494                  1934.Little_Women     1934   \n",
       "4495                  1934.Little_Women     1934   \n",
       "4496                  1934.Little_Women     1934   \n",
       "4497                  1934.Little_Women     1934   \n",
       "\n",
       "                         book_title  \\\n",
       "0     One Hundred Years of Solitude   \n",
       "1     One Hundred Years of Solitude   \n",
       "2     One Hundred Years of Solitude   \n",
       "3     One Hundred Years of Solitude   \n",
       "4     One Hundred Years of Solitude   \n",
       "...                             ...   \n",
       "4493                   Little Women   \n",
       "4494                   Little Women   \n",
       "4495                   Little Women   \n",
       "4496                   Little Women   \n",
       "4497                   Little Women   \n",
       "\n",
       "                                            review_url   review_id  \\\n",
       "0     https://www.goodreads.com/review/show/3572683523  3572683523   \n",
       "1     https://www.goodreads.com/review/show/3598284436  3598284436   \n",
       "2     https://www.goodreads.com/review/show/3582954889  3582954889   \n",
       "3     https://www.goodreads.com/review/show/2122452416  2122452416   \n",
       "4     https://www.goodreads.com/review/show/3541262010  3541262010   \n",
       "...                                                ...         ...   \n",
       "4493  https://www.goodreads.com/review/show/3508929611  3508929611   \n",
       "4494  https://www.goodreads.com/review/show/3486426118  3486426118   \n",
       "4495  https://www.goodreads.com/review/show/3532663154  3532663154   \n",
       "4496  https://www.goodreads.com/review/show/3359606301  3359606301   \n",
       "4497  https://www.goodreads.com/review/show/2459445140  2459445140   \n",
       "\n",
       "            date  rating         user_name  \\\n",
       "0     2020-10-16     4.0     Gitte Hornung   \n",
       "1     2020-10-16     5.0        Rob Jacobs   \n",
       "2     2020-10-16     2.0        Rona Karen   \n",
       "3     2020-10-15     4.0          Emma Pek   \n",
       "4     2020-10-15     4.0    Irini Kallides   \n",
       "...          ...     ...               ...   \n",
       "4493  2020-09-14     4.0           Michael   \n",
       "4494  2020-09-14     2.0            Deanna   \n",
       "4495  2020-09-14     2.0             Kelly   \n",
       "4496  2020-09-14     2.0     Ashley Brooks   \n",
       "4497  2020-09-14     4.0  Khryzette Onishi   \n",
       "\n",
       "                                  user_url  \\\n",
       "0        /user/show/63555571-gitte-hornung   \n",
       "1          /user/show/114599362-rob-jacobs   \n",
       "2            /user/show/6864201-rona-karen   \n",
       "3             /user/show/53126830-emma-pek   \n",
       "4      /user/show/113080501-irini-kallides   \n",
       "...                                    ...   \n",
       "4493          /user/show/108824026-michael   \n",
       "4494            /user/show/43177294-deanna   \n",
       "4495              /user/show/8545345-kelly   \n",
       "4496     /user/show/17334972-ashley-brooks   \n",
       "4497  /user/show/10999719-khryzette-onishi   \n",
       "\n",
       "                                                   text  num_likes sort_order  \\\n",
       "0     Phew this was a long book. At one point I cons...          0     newest   \n",
       "1                                        Great journey!          0     newest   \n",
       "2     Two stars for now. The repetitive names annoye...          0     newest   \n",
       "3     The story was timeless and full of excitement....          0     newest   \n",
       "4     Amazed about the richness of his language and ...          0     newest   \n",
       "...                                                 ...        ...        ...   \n",
       "4493  This was a great book despite how hard it was ...          0     newest   \n",
       "4494  Wow that did not live up to the hype, but I be...          0     newest   \n",
       "4495  I listened to 'Little Women' as an audiobook a...          0     newest   \n",
       "4496  Reading this as an adult was a different exper...          0     newest   \n",
       "4497  Rating: 4.5 stars.A coming of age in its simpl...          0     newest   \n",
       "\n",
       "            shelves  \n",
       "0                []  \n",
       "1                []  \n",
       "2                []  \n",
       "3                []  \n",
       "4     ['book-club']  \n",
       "...             ...  \n",
       "4493             []  \n",
       "4494             []  \n",
       "4495             []  \n",
       "4496             []  \n",
       "4497             []  \n",
       "\n",
       "[4498 rows x 13 columns]"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "newest_reviews "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "How many Goodreads reviews did we collect for each book?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "To Kill a Mockingbird                          300\n",
       "The Great Gatsby                               300\n",
       "Wuthering Heights                              300\n",
       "One Hundred Years of Solitude                  300\n",
       "The Scarlet Letter                             300\n",
       "Narrative of the Life of Frederick Douglass    300\n",
       "The Left Hand of Darkness                      300\n",
       "The Bluest Eye                                 300\n",
       "Jane Eyre                                      300\n",
       "Mrs. Dalloway                                  300\n",
       "Little Women                                   300\n",
       "The Handmaid's Tale                            300\n",
       "The Catcher in the Rye                         300\n",
       "The Iliad                                      299\n",
       "Pride and Prejudice                            299\n",
       "Name: book_title, dtype: int64"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "newest_reviews['book_title'].value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "How many Goodreads reviews did we collect in total?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4498"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(newest_reviews)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
